Questions about training data preparation #377
Unanswered
turbosonics
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I have some questions about training data preparation. These questions may sounds silly, but I couldn't find anything useful regarding these problems from NequIP papers I read so far...
I wish to perform NequIP training for condensed phase simple cubic crystal which involves diffusion of a light element. I hope NequIP captures diffusivity, density, and modulus correctly.
So, I performed some VASP DFTs that can be categorized as four groups as following:
Under/over-representation.
I combined the result DFT data of all four groups to a single file to perform NequIP training. DFT is VASP but I converted them to extxyz. In this case, data amount from DFT-1 and DFT-2 greatly outnumbers compared to those of DFT-3 and DFT-4. It is ~68k frames vs 49 frames.
Maybe my understanding towards NequIP is not enough yet so I may be wrong, but what I'm worrying is that the trend of data from DFT-3 and DFT-4 is overwhelmed by DFT-1 and DFT-2 because of data amount difference (number of frames).
Is this worry a valid one? Or is this something I don't need to worry about, and NequIP will process DFTs regardless of DFT type (AIMD, NEB, Geo Opt) and frame number?
Or, if my worry is valid, is there any hyperparameters that I can tweak to resolve this problem?
NPT AIMD for NequIP training reference data?
This question is related to question 1. If ~40 frames of equation of state (=40 geo opts) is not enough, then would it be better to run ~40ps NPT AIMD with huge pressure to get volume-energy relation for ~20k number of frames? I hope my parameters to be prepared for expansion/compression event for better mechanical property description. My loss functions set up is energy 1 (PerAtomMSELoss), stress 1, and force 1.
Input data format preference (if any).
Instead of using OUTCAR file directly, I convert them to extxyz using ase. I think extxyz is much easier to process data, instead of playing with OUTCAR file. But is this something recommended for NequIP? Or would reading VASP raw data file (=OUTCAR file) be more preferred by NequIP for any unknown reasons?
Beta Was this translation helpful? Give feedback.
All reactions