Time consumption before the train starts #182
-
Hello! --------- Update 2022-07-01 ----------- |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 22 replies
-
Hi @ssrokyz the very first time you train NequIP, it parses the entire data set and creates neighborlists. It will then for subsequent runs re-load that dataset, so you only have to do this once. However, depending on data set size and the size of the structures, this can take some time. How large are the structures? |
Beta Was this translation helpful? Give feedback.
-
Is it possible to parallelize the data parsing step? |
Beta Was this translation helpful? Give feedback.
-
@Linux-cpp-lisp I'm using ASE version 3.22.1 which is the latest in conda repository. My dataset has 60000 structures and is in ASE trajectory format. I think this is not a memory issue since I got the same error with a reduced dataset of 600 structures. (Below, error with 600 structures)
|
Beta Was this translation helpful? Give feedback.
-
@Linux-cpp-lisp With a large NEQUIP_NUM_TASKS (32 or 64 tested and failed), it gives the error message below.
With a small number of NEQUIP_NUM_TASKS=8 or =2, it also fails but gives a different error message like below.
All seem like memory issues. |
Beta Was this translation helpful? Give feedback.
-
@Linux-cpp-lisp One more thing, the training process seems weird. Below are the last few dozen lines of the log file.
And I found that multi-processes were remaining in the training process, even the data parsing was finished.
One last thing, |
Beta Was this translation helpful? Give feedback.
-
With a 64-core CPU, it took only 4 minutes before the training process starts. |
Beta Was this translation helpful? Give feedback.
-
@Linux-cpp-lisp |
Beta Was this translation helpful? Give feedback.
-
Hi @ssrokyz |
Beta Was this translation helpful? Give feedback.
Hi @ssrokyz the very first time you train NequIP, it parses the entire data set and creates neighborlists. It will then for subsequent runs re-load that dataset, so you only have to do this once. However, depending on data set size and the size of the structures, this can take some time. How large are the structures?