-
HNU
- Changsha, China
-
08:23
(UTC +08:00)
Highlights
- Pro
Lists (6)
Sort Name ascending (A-Z)
Starred repositories
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of how to use autoTVM to search and optimize a neural network i…
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Open deep learning compiler stack for Kendryte AI accelerators ✨
The road to hack SysML and become an system expert
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
flash attention tutorial written in python, triton, cuda, cutlass
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.
A list of awesome compiler projects and papers for tensor computation and deep learning.
Allegro is an open-source code for building highly scalable and accurate equivariant deep learning interatomic potentials
NequIP is a code for building E(3)-equivariant interatomic potentials
SevenNet - a graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations.
aria2 is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
RubiaCx / flatformer
Forked from mit-han-lab/flatformer[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer