CPU Parallelization #197
-
Hi, I don't have access to GPUs so I have been trying to train a model on CPUs with openmp parallelization. I have tried different values for OMP_NUM_THREADS and MKL_NUM_THREADS variables as recommended in some pytorch openmp discussion threads, but all of them seem to require over an hour (sometimes even 2) for 1 epoch. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
Hi @atulcthakur , (A side note: we recommend starting from Are you also using One thing you can do to get a bit of a sense of timing is to run When you run with I previously tried this at one point on a pretty powerful AMD CPU and I think I had to set some environment var to convince MKL to run efficiently (see https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AMD-Ryzen-and-Threadripper-CPU-s-Effectively-for-Python-Numpy-And-Other-Applications-1637/), this could be relevant if you are on AMD. Overall, CPU performance for training is not something we've really looked at because it is a fairly unusual usecase, both for us and for PyTorch itself. |
Beta Was this translation helpful? Give feedback.
-
Hi @Linux-cpp-lisp, Thanks for your response. I am training with When I run with As far as the AMD SIMD throttling thing goes, I'm working on an intel platform so it is unlikely to work for me. I am inclined to give it a shot though. As a follow up, I wanted to ask if 8000 frames is too much of training data for NequIP? My training data is pretty diverse (sampled from AIMD simulations over a broad range of temperatures) so I can scale it down to 2000 frames or even less. |
Beta Was this translation helpful? Give feedback.
-
Re num threads, I believe that past a certain point MKL will just ignore you if you are asking for more threads than makes sense. I'm guessing your system has 6 physical cores/12 logical cores? If you are maxing out CPU usage there is nothing to gain by going higher... and I suspect that MKL is also just ignoring you when you go lower too 😄
No gain expected then...
cc @simonbatzner, but also can you provide a little more detail? This will depend strongly on the chemical, reactive, etc. complexity of your system. The amount of data needed for a monometalic bulk will be very different than for some chemically complex diffusive system, for example. |
Beta Was this translation helpful? Give feedback.
-
@atulcthakur re num threads, I completely forgot but there is a PyTorch option as well: As well as |
Beta Was this translation helpful? Give feedback.
-
Latest discussion on CPUs can be found here: #303 (comment) |
Beta Was this translation helpful? Give feedback.
Latest discussion on CPUs can be found here: #303 (comment)