Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yzhangcs authored Mar 9, 2024
1 parent e65a691 commit 81c5d41
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ Join [discord](https://discord.gg/vDaJTmKNcS) if you are interested in this proj
| 2023-07 | RetNet (@MSRA@THU) | Retentive network: a successor to transformer for large language models | [[arxiv]](https://arxiv.org/abs/2307.08621) | [[official]](https://github.com/microsoft/torchscale/tree/main) [[RetNet]](https://github.com/Jamie-Stirling/RetNet/tree/main) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/multiscale_retention.py) |
| 2023-12 | GLA (@MIT@IBM) | Gated Linear Attention Transformers with Hardware-Efficient Training | [[arxiv]](https://arxiv.org/abs/2312.06635) | [[official]](https://github.com/berlino/gated_linear_attention) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/gla.py) |
| 2023-12 | Based (@Stanford@Hazyresearch) | An Educational and Effective Sequence Mixer | [[blog]](https://hazyresearch.stanford.edu/blog/2023-12-11-zoology2-based) | [[official]](https://github.com/HazyResearch/zoology) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/based.py) |
| 2024-01 | Rebased | Linear Transformers with Learnable Kernel Functions are Better In-Context Models | [[arxiv]](https://arxiv.org/abs/2402.10644) | [[official]](https://github.com/corl-team/rebased/) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/rebased.py) |
| 2021-02 | Delta Net | Linear Transformers Are Secretly Fast Weight Programmers | [[arxiv]](https://arxiv.org/abs/2102.11174) | [[official]](https://github.com/IDSIA/recurrent-fwp) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/delta_net.py) |
| 2024-01 | Rebased | Linear Transformers with Learnable Kernel Functions are Better In-Context Models | [[arxiv]](https://arxiv.org/abs/2402.10644) | [[official]](https://github.com/corl-team/rebased/) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/rebased.py) |
| 2021-02 | Delta Net | Linear Transformers Are Secretly Fast Weight Programmers | [[arxiv]](https://arxiv.org/abs/2102.11174) | [[official]](https://github.com/IDSIA/recurrent-fwp) | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/delta_net.py) |
| 2023-09 | Hedgehog (@HazyResearch) | The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry | [openreview](https://openreview.net/forum?id=4g02l2N2Nx) | | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/linear_attn.py#L51) |
| 2023-10 | PolySketchFormer (@CMU@Google) | Fast Transformers via Sketching Polynomial Kernels | [arxiv](https://arxiv.org/abs/2310.01655) | | TODO |
| 2023-07 | TransnormerLLM | A Faster and Better Large Language Model with Improved TransNormer (@Shanghai AI Lab) | [openreview](https://openreview.net/forum?id=OROKjdAfjs) [arxiv](https://arxiv.org/abs/2307.14995) | [[official]](https://github.com/OpenNLPLab/TransnormerLLM) [[Lightning2]](https://github.com/OpenNLPLab/lightning-attention) | TODO |
| 2023-05 | RWKV-v6 (@BlinkDL) | Reinventing RNNs for the Transformer Era | [arxiv](https://arxiv.org/abs/2305.13048) | [[official]](https://github.com/BlinkDL/RWKV-LM) | TODO |
| 2023-10 | GateLoop | Fully Data-Controlled Linear Recurrence for Sequence Modeling | [openreview](https://openreview.net/forum?id=02Ug9N8DCI) [arxiv](https://arxiv.org/abs/2311.01927) | [[official]](https://github.com/tobiaskatsch/GateLoop) [[jax]](https://github.com/lucidrains/gateloop-transformer) | TODO |
| 2021-10 | ABC (@UW) | Attention with Bounded-memory Control | [arxiv](https://arxiv.org/abs/2110.02488) | | TODO |
| 2021-10 | ABC (@UW) | Attention with Bounded-memory Control | [arxiv](https://arxiv.org/abs/2110.02488) | | [code](https://github.com/sustcsonglin/flash-linear-attention/blob/main/fla/layers/abc.py) |
| 2023-09 | VQ-transformer | Linear-Time Transformers via Vector Quantization | [arxiv](https://arxiv.org/abs/2309.16354) | [[official]](https://github.com/transformer-vq/transformer_vq) | TODO |

# Installation
Expand Down

0 comments on commit 81c5d41

Please sign in to comment.