Standalone module of Gated Linear Attention Transformer Layer (GLA) from Gated Linear Attention Transformers with Hardware-Efficient Training
This repo will not be maintained. Just track some useful git commit. Please refer to flash-linear-attention.
torch triton nightly-release