Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the definitions in block sparse attention #519

Open
szhengac opened this issue Nov 10, 2020 · 4 comments
Open

Question about the definitions in block sparse attention #519

szhengac opened this issue Nov 10, 2020 · 4 comments

Comments

@szhengac
Copy link
Contributor

Hi, I have some question regarding the block sparse attention.

If I understand the description of API correctly, block is the block size (i.e., number of tokens in a block) while num_local_blocks denotes the number of blocks (#tokens_per_window = block * num_local_blocks) in a local window. So no matter which value (unidirectional or bidirectional) I choose for attention, the tokens within a block will attend each other?

@arashashari
Copy link
Contributor

arashashari commented Nov 10, 2020

Yes, that is correct. Tokens within a block always attend to each other no matter if it is uni/bi-directional. However, if you look at a local window, in the case of unidirectional, you can consider tokens within a block only attend to other tokens in the blocks before them in the same local window. While in case of bidirectional all tokens in the local window (no matter which block they are in) attend to each other.
Please let us know if your question is answered and we will close the issue.

@szhengac
Copy link
Contributor Author

szhengac commented Nov 10, 2020 via email

@arashashari
Copy link
Contributor

You can use the attention mask to neutralize it; in such cases, attention mask is of dimension [leading dimensions, S, S] in which S stands for sequence length.

@szhengac
Copy link
Contributor Author

I see. That is indeed one option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants