Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train PNet is so slow #61

Open
tzhang2014 opened this issue Jan 31, 2018 · 10 comments
Open

train PNet is so slow #61

tzhang2014 opened this issue Jan 31, 2018 · 10 comments

Comments

@tzhang2014
Copy link

tzhang2014 commented Jan 31, 2018

when I run python example/train_P_net.py --gpus 0 , My GPU is 1070
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-Accuracy=0.697969
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-LogLoss=0.617246
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-BBOX_MSE=0.103584
can you help me ? this is a wrong ? Where is the mistake?thx

@xiaoxiongli
Copy link

you need put your data in SSD disk

@tzhang2014
Copy link
Author

tzhang2014 commented Feb 5, 2018

@xiaoxiongli thank you, how much time in your PC, What is the configuration of your PC? thx

@linsoncvw
Copy link

@tzhang2014 i also meet this problem, how did you improve it?

INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-Accuracy=0.697195
INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-LogLoss=0.614800
INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-BBOX_MSE=0.106309

@linsoncvw
Copy link

Only the first round is slow, the other is very fast.

@Qidian213
Copy link

You can change mxnet's environment variables to speed training ,just like cmd : export MXNET_GPU_WORKER_NTHREADS=4 (default = 2) and : export MXNET_GPU_COPY_NTHREADS=4 (default = 1) . after i did it , every thing became better

eg : i7-7700 gtx1060
INFO:root:Epoch[0] Batch [3780] Speed: 8343.78 samples/sec Accuracy=0.898810 LogLoss=0.270442 BBOX_MSE=0.015827
INFO:root:Epoch[0] Batch [3800] Speed: 9112.26 samples/sec Accuracy=0.891901 LogLoss=0.282063 BBOX_MSE=0.015802
INFO:root:Epoch[0] Batch [3820] Speed: 10172.07 samples/sec Accuracy=0.883745 LogLoss=0.303172 BBOX_MSE=0.015691
INFO:root:Epoch[0] Batch [3840] Speed: 10388.03 samples/sec Accuracy=0.878459 LogLoss=0.288958 BBOX_MSE=0.015310
INFO:root:Epoch[0] Batch [3860] Speed: 9720.13 samples/sec Accuracy=0.885983 LogLoss=0.310603 BBOX_MSE=0.015680
INFO:root:Epoch[0] Batch [3880] Speed: 9980.33 samples/sec Accuracy=0.879565 LogLoss=0.300225 BBOX_MSE=0.016198

@tzhang2014
Copy link
Author

@linsoncvw After 1 epoch ,the speed is so fast. I don't understand the reason

@geoffzhang
Copy link

Did you meet "Cannot find argument 'out_grad'" when using train_P_net.py?

@EmiPark
Copy link

EmiPark commented Jul 3, 2018

@geoffzhang I met the same problem,did you fix it?

@zuoqing1988
Copy link

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py

@cuiyong127
Copy link

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py
delete "out_grad = True",whether it has an impact on training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants