[go: nahoru, domu]

Skip to content
View RubiaCx's full-sized avatar
😇
😇
  • HNU
  • Changsha, China
  • 08:23 (UTC +08:00)

Highlights

  • Pro

Block or report RubiaCx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解

Java 31,052 6,848 Updated Sep 28, 2024

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 191 31 Updated Apr 27, 2022

autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of how to use autoTVM to search and optimize a neural network i…

C++ 27 6 Updated May 6, 2021

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 130 9 Updated Sep 24, 2024

Open deep learning compiler stack for Kendryte AI accelerators ✨

C# 738 181 Updated Sep 27, 2024

The road to hack SysML and become an system expert

Emacs Lisp 426 49 Updated Sep 25, 2024
Jupyter Notebook 61 6 Updated Jul 23, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 359 29 Updated Sep 28, 2024

Extending and Modifying LAMMPS中文翻译

3 Updated Oct 24, 2022

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 171 13 Updated Jun 18, 2024
JavaScript 9 3 Updated Sep 22, 2024

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 452 34 Updated Mar 15, 2024

TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.

C++ 11 Updated Sep 18, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,330 295 Updated Jul 14, 2024

Puzzles for learning Triton

Jupyter Notebook 1,008 64 Updated Sep 25, 2024

System for AI Education Resource.

Python 3,450 430 Updated Jun 21, 2024
68 24 Updated Sep 14, 2023

Allegro is an open-source code for building highly scalable and accurate equivariant deep learning interatomic potentials

Python 326 46 Updated Jul 1, 2024

NequIP is a code for building E(3)-equivariant interatomic potentials

Python 611 135 Updated Sep 17, 2024

SevenNet - a graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations.

Python 110 13 Updated Sep 25, 2024

An ML Systems Onboarding list

514 20 Updated Jul 23, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,566 256 Updated Sep 28, 2024
Jupyter Notebook 1 Updated Aug 8, 2024

A New Format for SIMD-accelerated SpMV

C++ 19 4 Updated Apr 4, 2022

aria2 is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.

C++ 35,265 3,571 Updated Aug 3, 2024

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Python 1 Updated Jul 9, 2024

play gemm with tvm

Cuda 83 10 Updated Jul 22, 2023

BLISlab: A Sandbox for Optimizing GEMM

C 469 100 Updated Jun 17, 2021
Python 14 3 Updated Apr 15, 2022

pytorch模板,简化并规范模块编写

Python 17 8 Updated May 20, 2021
Next