Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152, I…

Python 474 78 Updated Jun 30, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 9,238 2,086 Updated Jun 27, 2024

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 3,154 255 Updated Jun 27, 2024

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 10,848 969 Updated Jun 27, 2024

InternLM / InternEvo

Python 225 37 Updated Jul 1, 2024

cuda-mode / ring-attention

ring-attention experiments

Python 75 10 Updated Apr 10, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 7,961 564 Updated Jun 27, 2024

google / neural-tangents

Fast and Easy Infinite Neural Networks in Python

Jupyter Notebook 2,250 228 Updated Mar 1, 2024

lucidrains / ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 397 22 Updated Apr 20, 2024

exists-forall / striped_attention

Python 30 2 Updated Nov 10, 2023

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 666 178 Updated Jul 1, 2024

gregorbachmann / Next-Token-Failures

Python 49 3 Updated Mar 12, 2024

lhao499 / ringattention

Transformers with Arbitrarily Large Context

Python 560 43 Updated Jun 22, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,848 129 Updated Jun 30, 2024