Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About args use_tvm #24

Open
htg17 opened this issue Feb 20, 2023 · 3 comments
Open

About args use_tvm #24

htg17 opened this issue Feb 20, 2023 · 3 comments

Comments

@htg17
Copy link

htg17 commented Feb 20, 2023

In long_range_main.py the use_tvm arg is set to be default FALSE, and in the sample scripts this arg is not triggered. But if this arg is FALSE, it seems that pyramidal attention is not used in the whole model, which is the main contribution of the paper.

So if this arg should be set TRUE when I want to use pyramidal attention to save computation lost?

@Zhazhan
Copy link

Zhazhan commented Feb 20, 2023

We provide two implementations of pyramidal attention, namely the naive version and the TVM version, where the Naive version cannot reduce the complexity of time and space. Because the TVM version may require the user to compile the TVM, we set use_tvm=False by default to facilitate the reproduction of our results.

If you want to use the TVM implementation without compiling TVM, please set use_tvm=True and make sure: (1) the operating system is Ubuntu, (2) the CUDA version is 11.1. Otherwise, you can compile TVM 0.8.0 according to their official guide https://tvm.apache.org/docs/.

If you feel too troubled to compile, as an alternative, you can find a compiled TVM docker image from https://tvm.apache.org/docs/install/docker.html#docker-source. Then delete files under 'pyraformer/lib' and run the code again.

@htg17
Copy link
Author

htg17 commented Feb 20, 2023

Thanks for answering. I just wonder whether the naive choice is pyramidal attention.

If use_tvm=FALSE, the MultiHeadAttention in SubLayers.py is used as self-attention model. But it seems that the MultiHeadAttention is just a vanillla attention.

@Zhazhan
Copy link

Zhazhan commented Feb 20, 2023

The Naive implementation implements pyramidal attention by adding an attention mask to the attention score matrix. The 'MultiHeadAttention' module is indeed the vanilla attention. The differences lie in the 'Encoder' module. Please refer to line 19-22 and 51-54 in pyraformer/Pyraformer_LR.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants