[go: nahoru, domu]

Skip to content
View monellz's full-sized avatar
  • Tsinghua University

Highlights

  • Pro
Block or Report

Block or report monellz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 5,349 375 Updated Jul 9, 2024

ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch

Python 9 Updated Jun 17, 2024

xLSTM as Generic Vision Backbone

Python 325 21 Updated Jul 4, 2024

Official code for the paper "Attention as a Hypernetwork"

Python 18 Updated Jun 22, 2024
Python 8 1 Updated Jun 5, 2024

A mod for GGST that adds training mode features

C++ 46 2 Updated Jul 7, 2024

A multi-level tensor algebra superoptimizer

C++ 266 16 Updated Jul 9, 2024

🏡 Open source home automation that puts local control and privacy first.

Python 69,858 28,955 Updated Jul 9, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 1,877 29 Updated Jun 6, 2024

CUDA Core Compute Libraries

C++ 967 119 Updated Jul 9, 2024

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 79 12 Updated May 21, 2024

A simple and efficient Mamba implementation in pure PyTorch and MLX.

Python 781 65 Updated Jul 7, 2024

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Python 3,535 308 Updated Jul 3, 2024

Kolmogorov Arnold Networks

Jupyter Notebook 13,695 1,209 Updated Jul 8, 2024

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 141 12 Updated Jul 4, 2024
Python 32 1 Updated Apr 13, 2024

An easy to use PyTorch to TensorRT converter

Python 4,476 669 Updated Jun 17, 2024
Python 28 2 Updated Mar 26, 2024

CUDA/Metal accelerated language model inference

C 339 13 Updated Jun 7, 2024

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 174 15 Updated Apr 24, 2024

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. S…

Jupyter Notebook 1 Updated Jun 18, 2024

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Python 233 20 Updated Apr 20, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 405 24 Updated Jul 8, 2024

Token Omission Via Attention

Python 113 6 Updated Feb 11, 2024

CUDA Kernel Benchmarking Library

Cuda 446 61 Updated Jun 5, 2024

"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu,…

Python 16 1 Updated May 7, 2024

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Python 55 4 Updated Feb 13, 2024

Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"

Python 94 10 Updated Apr 10, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,072 528 Updated Apr 19, 2024
Next