acoording the training scripts to train, but I can not get the results of the paper #23

lantianyueming · 2025-01-31T15:18:33Z

the training scripts are as follows:
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python -u main.py
--output_dir $EXP_DIR
--with_box_refine
--two_stage
--dim_feedforward 2048
--epochs 12
--lr_drop 11
--coco_path=$coco_path
--num_queries 300
--use_ms_detr
--use_aux_ffn
--cls_loss_coef 1
--o2m_cls_loss_coef 2

supplement: use one RTX 4090 to train, not using distributed.

the training log and the results of the process of the training are added as an attachment.

log.txt
traing_model_results.txt

lantianyueming · 2025-01-31T15:33:33Z

Dear author, the training is according to train_ms_detr_300.sh, could you please tell me the training setting, environment,etc in detail ? I would be very grateful to you.

ZhaoChuyang · 2025-01-31T15:59:48Z

Hi, the train_ms_detr_300.sh script is trained on 8 * V100 GPU. Do you modify the batch size per GPU. I did not run model on a single GPU and do not know whether the changes in global batch size will affect the final results.

Here is the training log of ms-detr run with train_ms_detr_300.sh, it seems your loss, grad_norm, and the results of the first epoch are weird. Can you provide more information about your training script or run the Deformable-DETR baseline without MS-DETR to verify the influence of single-process training?

ms-detr-300 (2).txt

lantianyueming · 2025-02-01T01:11:02Z

Hi, the train_ms_detr_300.sh script is trained on 8 * V100 GPU. Do you modify the batch size per GPU. I did not run model on a single GPU and do not know whether the changes in global batch size will affect the final results.

Here is the training log of ms-detr run with train_ms_detr_300.sh, it seems your loss, grad_norm, and the results of the first epoch are weird. Can you provide more information about your training script or run the Deformable-DETR baseline without MS-DETR to verify the influence of single-process training?

ms-detr-300 (2).txt

Hi, I really appreciate your reply very much! And I have trained the model. The training script is as follows:
python main.py --output_dir /home/y/MS-DETR-main/trained_model --with_box_refine --two_stage --dim_feedforward 2048 --epochs 50 --lr_drop 11 --coco_path /home/y/data --num_queries 300 --use_ms_detr --cls_loss_coef 1 --o2m_cls_loss_coef 2 --batch_size 2 --use_aux_ffn�

And then the strange result shown above appeared, I do not know why, because I completely set according to the training log of the paper, except for only a single GPU, other parameter settings are the same. Thanks for your remind! I have only one RTX 4090, I will run the model on a single RTX 4090 to verify the influence of single-process training. Eventually, thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acoording the training scripts to train, but I can not get the results of the paper #23

acoording the training scripts to train, but I can not get the results of the paper #23

lantianyueming commented Jan 31, 2025

lantianyueming commented Jan 31, 2025

ZhaoChuyang commented Jan 31, 2025

lantianyueming commented Feb 1, 2025

acoording the training scripts to train, but I can not get the results of the paper #23

acoording the training scripts to train, but I can not get the results of the paper #23

Comments

lantianyueming commented Jan 31, 2025

lantianyueming commented Jan 31, 2025

ZhaoChuyang commented Jan 31, 2025

lantianyueming commented Feb 1, 2025