Is there any torch library based Differential Transformer code? #1640

DevKiHyun · 2024-10-16T14:15:17Z

Hi,

Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.

I wonder that is there any plan to upload simple example with transformer using Diff attention and example argument (ex. adjust num_heads according to original transformer's or other positional embedding...)

Thanks

AnticPan · 2024-10-18T08:11:25Z

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

DevKiHyun · 2024-10-23T09:01:38Z

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

Hi, AnticPan.

Can you share me your findings?

Thanks.

YTianZHU · 2024-11-18T08:57:47Z

Hi @DevKiHyun
You can refer to Section 3.1 and Appendix D in our paper for detailed configurations of our models. You can also directly use configs of open-sourced LLMs and change their model code to turn it into Diff arch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any torch library based Differential Transformer code? #1640

Is there any torch library based Differential Transformer code? #1640

DevKiHyun commented Oct 16, 2024

AnticPan commented Oct 18, 2024

DevKiHyun commented Oct 23, 2024

YTianZHU commented Nov 18, 2024

Is there any torch library based Differential Transformer code? #1640

Is there any torch library based Differential Transformer code? #1640

Comments

DevKiHyun commented Oct 16, 2024

AnticPan commented Oct 18, 2024

DevKiHyun commented Oct 23, 2024

YTianZHU commented Nov 18, 2024