Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with updated Based #3

Open
obv-mikhail opened this issue Mar 10, 2024 · 2 comments
Open

Comparison with updated Based #3

obv-mikhail opened this issue Mar 10, 2024 · 2 comments

Comments

@obv-mikhail
Copy link

Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?

@kefirski
Copy link
Collaborator

From this point, the updated arxiv version of Based is more like subsequent research on subquadratic architectures rather than a simple upgrade. This new version introduces combined linear and sliding window attention, which is orthogonal to selecting a linear attention kernel studied with our paper. Right now, we do not have evaluations of a rebased kernel combined with sliding window attention.

@elephantmipt
Copy link
Collaborator

Hi, I've just finished training the small 124M model, and it seems that replacing conv1d with sliding window attention is orthogonal to the Based/ReBased performance, as we achieve slightly better loss value. We will update our preprint and we have plans to release training pipeline and weights. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants