You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mirroring #29466, the newer model utilizing LinearAttention adds RMS norm to the attention forwarding and contracts the Taylor expansion to include only the third term, showing better performance and like BASED outperforms Mamba and stuff
From my perspective, there is a problem with Based/ReBased models in Huggingface Transformers: if someone wants to train/finetune a model, they need to utilize IO-Aware Triton kernels because the vanilla Torch implementation has a significant memory footprint. There is an implementation of several models here, and theoretically, it is possible to include flash_linear_attention as an optional dependency, similar to flash_attention. Additionally, there is still an option to use model for inference without any kernels.
Model description
Mirroring #29466, the newer model utilizing LinearAttention adds RMS norm to the attention forwarding and contracts the Taylor expansion to include only the third term, showing better performance and like BASED outperforms Mamba and stuff
Open source status
Provide useful links for the implementation
The repo https://github.com/corl-team/rebased
The text was updated successfully, but these errors were encountered: