Comparison with updated Based #3

obv-mikhail · 2024-03-10T20:39:59Z

Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?

kefirski · 2024-03-10T21:07:47Z

From this point, the updated arxiv version of Based is more like subsequent research on subquadratic architectures rather than a simple upgrade. This new version introduces combined linear and sliding window attention, which is orthogonal to selecting a linear attention kernel studied with our paper. Right now, we do not have evaluations of a rebased kernel combined with sliding window attention.

elephantmipt · 2024-03-13T15:28:42Z

Hi, I've just finished training the small 124M model, and it seems that replacing conv1d with sliding window attention is orthogonal to the Based/ReBased performance, as we achieve slightly better loss value. We will update our preprint and we have plans to release training pipeline and weights. Stay tuned!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with updated Based #3

Comparison with updated Based #3

obv-mikhail commented Mar 10, 2024

kefirski commented Mar 10, 2024

elephantmipt commented Mar 13, 2024

Comparison with updated Based #3

Comparison with updated Based #3

Comments

obv-mikhail commented Mar 10, 2024

kefirski commented Mar 10, 2024

elephantmipt commented Mar 13, 2024