ReBASED #29481

kabachuha · 2024-03-06T11:04:31Z

Model description

Mirroring #29466, the newer model utilizing LinearAttention adds RMS norm to the attention forwarding and contracts the Taylor expansion to include only the third term, showing better performance and like BASED outperforms Mamba and stuff

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

The repo https://github.com/corl-team/rebased

ArthurZucker · 2024-03-07T10:47:46Z

If there are no model weights, the chances of merging it are very low 😢

kabachuha · 2024-03-09T16:10:29Z

@ArthurZucker Their team now provided the weights and generation code corl-team/rebased#2

elephantmipt · 2024-03-09T17:58:54Z

From my perspective, there is a problem with Based/ReBased models in Huggingface Transformers: if someone wants to train/finetune a model, they need to utilize IO-Aware Triton kernels because the vanilla Torch implementation has a significant memory footprint. There is an implementation of several models here, and theoretically, it is possible to include flash_linear_attention as an optional dependency, similar to flash_attention. Additionally, there is still an option to use model for inference without any kernels.

kabachuha added the New model label Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReBASED #29481

ReBASED #29481

kabachuha commented Mar 6, 2024 •

edited

Loading

ArthurZucker commented Mar 7, 2024

kabachuha commented Mar 9, 2024

elephantmipt commented Mar 9, 2024 •

edited

Loading

ReBASED #29481

ReBASED #29481

Comments

kabachuha commented Mar 6, 2024 • edited Loading

Model description

Open source status

Provide useful links for the implementation

ArthurZucker commented Mar 7, 2024

kabachuha commented Mar 9, 2024

elephantmipt commented Mar 9, 2024 • edited Loading

kabachuha commented Mar 6, 2024 •

edited

Loading

elephantmipt commented Mar 9, 2024 •

edited

Loading