-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear Attention Mechanism #2150
Comments
Please feel free to open a PR @parmarsuraj99 Thank you |
/cc @saberkun @tanzhenyu @dynamicwebpaige Do you have any internal plan for this? |
Looking at the pytorch implementation, what's the difference with https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/layers/dense_attention.py? |
In the PyTorch implementation, the authors have implemented many variants of attentions https://github.com/idiap/fast-transformers/tree/master/fast_transformers/attention The one referenced above is this specific one Major difference is the calculation of Values. Instead of Softmax, they introduced a kernel (here Major change is focused only in this part. addons/tensorflow_addons/layers/multihead_attention.py Lines 223 to 239 in d466cb8
Can we implement something like callable Attention calculation after calculating Linear projections of inputs?
|
This seems to be something we could ask user to subclass? |
@parmarsuraj99 Thanks! A bit more concrete idea of subclassing. |
Thanks This is exactly what I was referring for implementation. |
Given the feedback from TF team, if you would like to submit and maintain a PR for subclassing the Keras MHA that would be okay to proceed. |
@seanpmorgan I'd like to contribute to this too. @parmarsuraj99 let me know if you'd like to work on it together? |
@abhishek-niranjan Sure. I'd really love to collaborate |
How is this going? |
TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision: Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA: |
Describe the feature and the current behavior/state.
Are we going to add
LinearAttention
? If yes, I can start working on itRelevant information
Which API type would this fall under (layer, metric, optimizer, etc.) layer
Who will benefit with this feature? Building Transformer blocks that are faster with O(N) complexity compared to standard Softmax dot product attnetion
Any other info.
Paper's website
The text was updated successfully, but these errors were encountered: