Replies: 4 comments 8 replies
-
Hello, wondering about the same |
Beta Was this translation helpful? Give feedback.
-
So building pytorch from source does produce significantly improved performance with OpenMP parallelization compared to the general release torch-cpu. However, MPI parallelization is not something present. Is nequip training MPI parallelized at all? Thank you. |
Beta Was this translation helpful? Give feedback.
-
Can confirm that there is no mpi parallelization on cpus. Single instance training only. Best performance can be found through the proper setting of intra and inter op parallelization, omp, etc. |
Beta Was this translation helpful? Give feedback.
-
Multi-CPU training with MPI is not available in our code--- we primarily target GPUs and strongly recommend that general users use them. As mentioned above, we inherit thread-based CPU parallelism from PyTorch's use of OpenMP / MKL, which seems to require a from-source build to enable. MPI multi-node parallelization of model training is not available. Model inference for Allegro can be parallelized over CPU MPI ranks in LAMMPS exactly the same way as over multiple GPUs using The closest thing to MPI we have is the prototype of multi-GPU training available on the |
Beta Was this translation helpful? Give feedback.
-
Hello,
I hope you are doing well. Apologies if this is a bit of a simple question, but I was trying to use mpirun in training with nequip. My hope is to try and minimize the training time by using mutiple mpi ranks. Does this require building torch from source and build against MPI, etc?
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions