🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
Updated
Jul 1, 2024 - Python
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Unify Efficient Fine-Tuning of 100+ LLMs
Faster Whisper transcription with CTranslate2
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
Neural Network Compression Framework for enhanced OpenVINO™ inference
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Sparsity-aware deep learning inference runtime for CPUs
Efficient computing methods developed by Huawei Noah's Ark Lab
模型压缩的小白入门教程
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Dataflow compiler for QNN inference on FPGAs
Contrastive-LSH Embedding and Tokenization Technique for Multivariate Time Series Classification
Brevitas: neural network quantization in PyTorch
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."