Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any torch library based Differential Transformer code? #1640

Open
DevKiHyun opened this issue Oct 16, 2024 · 3 comments
Open

Is there any torch library based Differential Transformer code? #1640

DevKiHyun opened this issue Oct 16, 2024 · 3 comments

Comments

@DevKiHyun
Copy link

Hi,

Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.

I wonder that is there any plan to upload simple example with transformer using Diff attention and example argument (ex. adjust num_heads according to original transformer's or other positional embedding...)

Thanks

@AnticPan
Copy link

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

@DevKiHyun
Copy link
Author

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

Hi, AnticPan.

Can you share me your findings?

Thanks.

@YTianZHU
Copy link
Contributor

Hi @DevKiHyun
You can refer to Section 3.1 and Appendix D in our paper for detailed configurations of our models. You can also directly use configs of open-sourced LLMs and change their model code to turn it into Diff arch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants