-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a mistake in jupyter #1
Comments
because HW2 forum is not open for online students. I present some of what I think are wrong and ambiguous in hw2 code at github.
I passed almost all the tests, except the |
I encountered precision issue during finishing BatchNorm1d in Q2 and SGD in Q3.
For SGD in Q3, test_optim_sgd_weight_decay_1 and test_optim_sgd_momentum_1 passed. So I think there should not be logic error with the implementation. For the FAILED test case, it's hard to debug the root cause. I guessed that it might be some precision issue.
|
Thank you. For the BatchNorm1d, I did not make the same mistake. But I just found the test precision was adjusted 4 days ago and I found the test passed with the adjusted precision. hw2/tests/test_nn_and_optim.py Line 771 in 80ed86b
Let me try your resolution of the SGD. I used the below equation to implement the sgd_momentum_weight_decay_1.
it is a very method to refer to the source code of PyTorch. I also took a long time to debug the Homework. I think it's because the HW was becoming more and more complex. |
i pass all sgd test by following equations. these equations are different to counterparts in slices. So i think jupyter needs to provide more hints.
|
Yes. Your equations could pass all SGD test cases. |
Haha I don't think it's caused by numerical precision, I met some problems seems like numerical precision while all because of error coding. Well I failed BTW, I passed all tests of |
test_optim_adam_z_memory_check_1 could be ignored because you might create less tensor. You could add grad into the computation graph to pass it, such as using grad instead of grad.data. For DropOut, if it is an evaluation, you should not drop out the input. Can you share your implementation BatchNorm1D and LayerNorm1D? I will share my implementaition below and delete them later. |
I passed DropOut now. Here's my implementation of BN and LN, I'll delete them 2 hours later: delete |
You could delete them. I think our implementations are the same. Let me check my other codes, -_-. |
Just clone to local, and write another script, copy the test function code, set break point in IDE and you can debug it. For example, create a 'debug.py' script, and copy function # debug.py
def learn_model_1d(feature_size, nclasses, _model, optimizer, epochs=1, **kwargs):
...
out = learn_model_1d(64, 16, lambda z: nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 16)), ndl.optim.Adam, lr=0.001) |
Edit: I've figured out my problem, sorry for the ping!
@weixliu sorry to ping you on an old issue. I am also facing this problem now, do you remember what was the root-cause of the problem? |
bias should be initialized to zero.
The text was updated successfully, but these errors were encountered: