-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backward pass #6
Comments
@Coluding Not sure I understand. Could you elaborate a bit? The forward pass is implemented here, and the backward pass can be done automatically with PyTorch. Are you thinking there is a more efficient way to perform the backward pass? In that case, it could make sense to implement it manually here, too. Or maybe some other reason that I'm overlooking? |
Hi @fkodom ! I was just wondering if you have tested the backward pass and what implications for memory and sequence length it has. Best regards! |
@Coluding Yes, the backward pass works and scales roughly the same as
^^ In that script, I dynamically choose the batch size, so that the total number of tokens is constant for all sequence lengths. So, it's roughly constant runtime when the forward/backward pass scales linearly with sequence length. I haven't explicitly checked for memory profiling, but AFAIK it should scale the same as |
Hi!
First of all, thanks for your great implementation. I think it is very awesome, I like it a lot.
I was wondering if you have also implemented a backward pass for the model somewhere, since you have only shown the forward pass in this repo (please correct me if I am wrong).
The reason why I am asking is because I want to train a reversible dilated Encoder model from scratch and your code seems very well suited for the attention mechanism.
Thanks in advance and kind regards!
The text was updated successfully, but these errors were encountered: