-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to improve detr #12
Comments
Hi @zhangzhen119,
For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time. If you have further questions, feel free to ask. |
Thank you very much for your help and congratulations on the work you have done on this |
My pleasure |
I set bf as 3, that is the batchformerv2. It is Because I use |
I am still very poor at code learning, so I asked some relatively simple questions, I really appreciate your prompt and effective reply, I will do experiments and learning based on your suggestions, and wish you a higher achievement |
Thanks. it is mainly because my code is too messy. |
Excuse me, I use your batchformerv2 in a transformer structure similar to detr, the parameters are set according to your suggestion, but in the end there is only about 0.1 improvement, is this improvement reasonable. Due to equipment problems, I set batchsize to 4, and did not use the optimal solution batchsize=24 mentioned in your article. Is the main reason for the small improvement is the problem of batchsize? If I want to only use batchsize=4 Are there any other possible solutions? sorry to trouble you |
Hi, how many epochs do you train the network? Could you provide the logs? Meanwhile, do you run the experiments on a single GPU with batchsize 4 or 4GPUs with batchsize 4? Here is the baseline log and Here is the batchformer log The two logs are trained with batchsize 16 and 8 GPUs. I do not implement the multi-gpu distributation training. Therefore, it depends on the batch size on a single gpu. |
Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small. |
Do u mean the performance drops after 17epochs? Do u use shared prediction modules? I mean siamese stream.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: zhangzhen119 ***@***.***>
Sent: Wednesday, October 26, 2022 5:54:36 PM
To: zhihou7/BatchFormer ***@***.***>
Cc: Zhi Hou ***@***.***>; Comment ***@***.***>
Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12)
你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验?
这是基线日志<https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing>,这是批处理器日志<https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing>
这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。
Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small.
―
Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Yes, my model started to drop at the 17th epoch without batchformer, so I think this is normal. I used what you shared with me in detr and then improved it. Sorry, I didn't see the use of shared modules, but when I looked at the code you shared with me, I found that I didn't use batchformer only in the training phase. This problem is caused, so I am going to use it only in the training phase and try again |
If you do not share other modules in the network, you will suffer from performance dropping when you do not use the batchformer in the test phrase. I copy the batch into batchformerv2 stream, then input the original feature batch and the feature batch with batchformerv2 into the next modules. |
Ok thank you, I'll try again, sorry for your inconvenience |
Could you please share about your improved code for batchformer on detr, I want to learn about the improvement for detr
The text was updated successfully, but these errors were encountered: