Unify Efficient Fine-Tuning of 100+ LLMs
-
Updated
Jul 1, 2024 - Python
Unify Efficient Fine-Tuning of 100+ LLMs
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Faster Whisper transcription with CTranslate2
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Sparsity-aware deep learning inference runtime for CPUs
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Run Mixtral-8x7B models in Colab or consumer desktops
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
OpenMMLab Model Compression Toolbox and Benchmark.
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."