NEP training not running on multiple GPU nodes #731
-
Hi GPUMD team, I was trying to obtain a scaling plot to compare the speed of training with increasing number of GPUs, and though I was able to obtain a good scaling within 1 GPU node (i.e. across 1,2,4 GPUs), when I submitted a job with 2 GPU nodes (i.e. 8 GPUs), the training was still using only 1 GPU node (4 GPUs). Is there a way to train a NEP model on 2 different nodes? Thank you so much! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Thanks for the question. There is curretly only support for single-node parallelization. To extend to more nodes, MPI is needed, but we have not implemented MPI yet. |
Beta Was this translation helpful? Give feedback.
Thanks for the question. There is curretly only support for single-node parallelization. To extend to more nodes, MPI is needed, but we have not implemented MPI yet.