Huge bumps in learning curves #74
-
Dear developers I have a problem while training the MLP. I get large bumps for the MAE of the forces. I know that it is not unusual to get some bumps while training, but usually the error goes back to the original value before the bump, quite fast. This is does not happen, it takes a few hours and in certain cases a new bump occurs befor reaching the old optimal error. See for example the figure below. I know that I could change for example learning rate, or batch size or even restart the training from the optimal value before the bump. But I was wondering if this is something you saw while training MLPs or if you know what might cause this? I ask because I am using almost the default settings of the "full.yaml" you provided (I only changed r_max to 6.0), so I would think that the settings are already quite good. However, I got this strange behavior for two different systems (CsPbI3 and FAPbI3) and two different training sets sizes (300 and 15000 structures). In the zip.file you can find the full.yaml file and the logs of the training belonging to the figure above. Kind regards Tom |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
In general I believe this is a sign that your learning rate is too aggressive, but @simonbatzner should be able to confirm this. Side note: in general we recommend starting with |
Beta Was this translation helpful? Give feedback.
-
Forwarding some comments:
Out of curiosity, what units is this in? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the tips. These results are in the units of ASE, so eV/Angstrom. EDIT: it will take a week before I know if this will resolve the issue. |
Beta Was this translation helpful? Give feedback.
-
Hi @tbraeckevelt, sorry for the late reply, a few additional notes to Alby's great answer:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer. I tested your suggestions: learning rate (LR) 0.005 and 0.001, and batch size (BS) 10 and 15: I come to the same conclusions as you did:
I will do some more test on other systems, but for now I will use LR=0.005 as default. |
Beta Was this translation helpful? Give feedback.
Thanks for the answer.
I tested your suggestions: learning rate (LR) 0.005 and 0.001, and batch size (BS) 10 and 15:
I come to the same conclusions as you did:
I will do some more test on other systems, but for now I will us…