Replies: 19 comments
-
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> alchemi5t |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> alchemi5t |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> alchemi5t |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> alchemi5t |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc
[September 26, 2019, 10:12am]
Hello everyone!
I have an access to a server with v100 gpu. So I tried to train a model
there with batch size 32 for training and 16 for testing. Unfortunately,
the gpu is not being used 100%. I mean, in average the load is 30-40%,
occasionally it goes up to 85-90 % for a short time, and goes down to
14%. My question is, if I set the batch size, let's say, 48 (with it set
to 32 it uses 11gb of gpu ram), will the gpu load go up (training
boosts) WHILE the final model performance will not be damaged?
After tests on single V100 I want to do distributed training. Should I
just use distributed.py instead of train.py and provide the same config
as for the single gpu training? Should I leave the same batch size?
Thank you a lot!
[This is an archived TTS discussion thread from discourse.mozilla.org/t/distributed-training-optimal-parameters]
Beta Was this translation helpful? Give feedback.
All reactions