[go: nahoru, domu]

Skip to content
View TaoLbr1993's full-sized avatar
🍉
🍉
Block or Report

Block or report TaoLbr1993

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 174 15 Updated Jul 23, 2024

XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.

Python 78 21 Updated May 15, 2019

Binarize convolutional neural networks using pytorch 🔥

Python 129 13 Updated Apr 26, 2022

Kernel Tuner

Python 260 46 Updated Jul 24, 2024

Step-by-step optimization of CUDA SGEMM

Cuda 185 32 Updated Mar 30, 2022

ImageNet classification using binary Convolutional Neural Networks

Lua 856 239 Updated Dec 5, 2017

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 159 12 Updated May 27, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 472 34 Updated Jul 10, 2024

A GPU-accelerated graph learning library for PyTorch, facilitating the scaling of GNN training and inference.

Python 111 30 Updated Jul 4, 2024

论文里可以用到的实验图示例

Python 179 49 Updated Jan 24, 2024

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,089 251 Updated Jul 24, 2024

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and…

Python 6,314 1,225 Updated Jul 24, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,156 431 Updated Jul 17, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 129,712 25,765 Updated Jul 25, 2024

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

C++ 77 12 Updated May 10, 2024

Fast inference from large lauguage models via speculative decoding

Python 431 47 Updated Jul 25, 2024

A library for subgraph GNN based on pyg

Python 37 Updated Jun 6, 2024
Python 251 31 Updated Apr 2, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,089 531 Updated Jul 24, 2024

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 665 84 Updated May 30, 2024

Ongoing research training transformer models at scale

Python 9,491 2,139 Updated Jul 23, 2024

Repo for external large-scale work

Python 6,440 722 Updated Apr 27, 2024

A playbook for systematically maximizing the performance of deep learning models.

25,937 2,160 Updated Jun 18, 2024

一个插件,国产化你的VSCode,来源于CEC-IDE,有敏感词检测、防沉迷等功能。

TypeScript 770 24 Updated Mar 16, 2024

Inference code for Llama models

Python 54,470 9,345 Updated Jul 23, 2024

Strategies for Pre-training Graph Neural Networks

Python 948 162 Updated Jul 29, 2023

Understanding and Extending Subgraph GNNs by Rethinking their Symmetries (NeurIPS 2022 Oral)

Python 38 2 Updated Jan 30, 2023
Next