Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hidden_dim constraint in transformer cuda kernel #491

Open
szhengac opened this issue Oct 28, 2020 · 5 comments
Open

hidden_dim constraint in transformer cuda kernel #491

szhengac opened this issue Oct 28, 2020 · 5 comments

Comments

@szhengac
Copy link
Contributor

I found that there is constraint on the dimensionality when we use the transformer cuda kernel: https://github.com/microsoft/DeepSpeed/blob/d720fdb6857f4b71d922ca1e8efbe5271b5fb7c2/csrc/transformer/normalize_kernels.cu#L232-L250

I wonder what is the reason behind it? Is there any plan to support arbitrary dimensionality? Or, If I want to use hidden_dim=4096 or 8192, what do I need to do to make it work? Thanks.

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @szhengac

Thanks for pointing this out. We are currently working on supporting arbitrary dimensions. There will be a code update soon to add this feature. Please stay tuned! :-)

Thanks.
Reza

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @szhengac

Sorry for the delay!
I have modified the transformer kernel so that it can support different hidden dimensions. I have the code in a different branch of deepspeed: https://github.com/microsoft/DeepSpeed/tree/transformer-kernel/support-arbitrary-hidden.
I have already tested this for various hidden dimensions from 128 to 8192 (https://github.com/microsoft/DeepSpeed/blob/transformer-kernel/support-arbitrary-hidden/tests/unit/test_cuda_forward.py#L216-L244).

Could you please try this and let me know if it works for your training environment?

Thank you,
Reza

@szhengac
Copy link
Contributor Author

szhengac commented Dec 4, 2020

@RezaYazdaniAminabadi. Thanks for the good work! I will try it when I get some free time.

@szhengac
Copy link
Contributor Author

szhengac commented Dec 8, 2020

Hi @RezaYazdaniAminabadi, we recently came across another issue of using DeepSpeed Transformer kernel as shown in #589

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @szhengac

Thanks for trying out the new kernels.
I see the issues. Could you please use this PR, #587, and see if the issue is solved?
Thanks
Reza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants