-
Alibaba Group
Block or Report
Block or report TaoLbr1993
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.
Binarize convolutional neural networks using pytorch 🔥
Step-by-step optimization of CUDA SGEMM
ImageNet classification using binary Convolutional Neural Networks
(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
A GPU-accelerated graph learning library for PyTorch, facilitating the scaling of GNN training and inference.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and…
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Fast inference from large lauguage models via speculative decoding
Running large language models on a single GPU for throughput-oriented scenarios.
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Ongoing research training transformer models at scale
A playbook for systematically maximizing the performance of deep learning models.
一个插件,国产化你的VSCode,来源于CEC-IDE,有敏感词检测、防沉迷等功能。
Strategies for Pre-training Graph Neural Networks
Understanding and Extending Subgraph GNNs by Rethinking their Symmetries (NeurIPS 2022 Oral)