-
Tsinghua University
Highlights
- Pro
Block or Report
Block or report monellz
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
Official code for the paper "Attention as a Hypernetwork"
Sevoii / StriveFrameViewer
Forked from Procdox/StriveFrameViewerA mod for GGST that adds training mode features
🏡 Open source home automation that puts local control and privacy first.
MambaOut: Do We Really Need Mamba for Vision?
Standalone Flash Attention v2 kernel without libtorch dependency
A simple and efficient Mamba implementation in pure PyTorch and MLX.
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
An easy to use PyTorch to TensorRT converter
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Kyriection / llama-recipes
Forked from meta-llama/llama-recipesScripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. S…
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu,…
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
Running large language models on a single GPU for throughput-oriented scenarios.