-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random projector? #1
Comments
Hi Yoshitomo!, Thank you :) and yes that must have been an oversight when re-writing the code and putting it in this repo. Unfortunately, I only put time into checking the |
I have fixed the the repo now. The line of interest is here:
It is a bit slow training with just a single V100, so I figured I would give an update on the progress. Both runs are about 40% done and, as expected, the trainable projector leads to much better performance. I will upload the complete and final logs and also the checkpoints when they are both done. Frozen Projector2024-02-08 09:28:26,407 INFO torchdistill.misc.log Epoch: [40] [4900/5005] eta: 0:00:36 lr: 0.010000000000000002 img/s: 749.388497121768 loss: 1.6882 (1.6121) time: 0.3448 data: 0.0004 max mem: 9402 Trainable Projector2024-02-08 08:57:20,154 INFO torchdistill.misc.log Epoch: [39] [4750/5005] eta: 0:01:27 lr: 0.010000000000000002 img/s: 746.3973317873493 loss: -0.3715 (-0.4433) time: 0.3440 data: 0.0003 max mem: 9402 |
That sounds great thanks! I'll just do a small loop through some hyperparameters first to get the best results. A proper implementation in torchdistill would be really great, thank you! I'll let you know when I have the results and code to share. |
@yoshitomo-matsubara I have just pushed now and it should all be good. I have also put the logs and model checkpoints in the README.md. |
@roymiles |
@yoshitomo-matsubara If it is not too much work, it would be really nice if you could implement this in torchdistill for me. That would be really helpful, thanks! 🙂 |
No problem, I can do it for you. What name would you pick for your method? If you don't have any preference, I would use |
Thanks! I think I'd prefer |
Similarly, I need the name at other places as well e.g., https://github.com/yoshitomo-matsubara/torchdistill/tree/main/configs/sample/ilsvrc2012 |
For the wrapper class, perhaps |
What about |
ah yea that sounds good 👍 |
Hi @roymiles I added your method as SRD to torchdistill repo Can you fork the current torchdistill repo and use this config to reproduce the number? Once you confirm the reproducibility, keep the log file and checkpoint file and submit a PR with the yaml file + README.md at |
Thanks so much for doing this! I'll give this a go sometime this/next week once I have a few GPUs free. |
@roymiles no problem! Let me know then |
Hi @roymiles How's the experiment going? |
I am really sorry for the late reply. I had some issues before with training, though it seems going to DataParallel (as you suggested) has fixed it a lot more cleanly :D I then got a bit bogged down with other work/personal events, but I have since started the run now and it seems to be training well. I'll have the results in the next few days. |
I finished the run but the results were a bit lower than I expected. Though I have just realised that this may be due to having a projector on the teacher side i.e. |
Hi @roymiles |
Hopefully this is the final update before I push the log, checkpoint file, and yaml. I was getting poor results because my runs were automatically loading the optimiser and checkpoints from my previous run with the same |
Hi @roymiles Does
|
This is what I found when trying to debug. The optimiser ended up starting at a much lower lr (in fact the final lr) than the config specified. It ended up being after this line: https://github.com/yoshitomo-matsubara/torchdistill/blob/3799847d0e24b89d22801f75f36f0d075906f928/examples/torchvision/image_classification.py#L132 This was with an empty |
ah src_ckpt should be used there. I will update the scripts soon. Thanks for pointing it out! |
By the way, you're also welcomed to advertise your BMVC'22 and AAAI'24 papers at |
Hi @roymiles It loos like the previous config did not use the normalized representations for computing a loss. |
Never mind, I reran the experiment with the fixed official config. |
Sorry for the late reply and ah that's a complete oversight on my part. It is really great that you spotted this and even better you got better results 😂! I have only just seen your previous post now, but I would definitely like to put a link/description to these papers on the project page. Thanks so much for this :D I will add a "show and tell discussion" this week. |
Hi @roymiles
Congratulations for the paper acceptance!
For the ImageNet experiment,
self.embed
inOurDistillationLoss
class looks like a random projector for ResNet-18's embeddings and seems not updated as it's not included in optimizer. Is it intentional? If so, why is it required?https://github.com/roymiles/Simple-Recipe-Distillation/blob/main/imagenet/torchdistill/losses/single.py#L140
The text was updated successfully, but these errors were encountered: