-
I am a pharmacology chemistry student deeply interested in the applications of generative AI, specifically the reinvent4. Although I am relatively new to deep learning, I have been actively engaging with reinvent4, deploying and running both the provided demos and experiments tailored to my projects. During my experimentation, particularly in the reinforcement learning phase, I have observed that the loss scores associated with model evaluation consistently increase. This seems counterintuitive as I understand that lower loss scores typically indicate better model performance. Given this, I am concerned that I may be misinterpreting the nature or the expectations of the loss score in this specific context. Could you please provide some guidance on how to interpret these results? Any detailed explanations or resources you could point me to would be immensely helpful as I aim to deepen my understanding of these processes. Thank you very much for your time and assistance. Best regards, Jack Hu |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi, many thanks for your interest in REINVENT and welcome to the community! the loss function is documented in our paper, see eqs. 5, 6 and discussion following eq 7. Please note carefully that the equations there are written with the log likelihood while the TensorBoard output logs the negative log likelihood (NLL) which, confusingly, is a positive number. Practically thinking, what you want to achieve in reinforcement learning (RL) is that the new model (agent) generates highly scoring SMILES with low NLL. This will, somewhat naturally, mean that the very same SMILES would probably not score well with the prior, which is used as a reference or "regularizer", and generate compounds with high NLL. As eq 6 computes the distance between the two models, NB the prior network is fixed, and adds the scaled score as reward, which you would want to behave as 0->1, eq 6 will have to increase in order for the agent to learn to generate better scoring SMILES. Also, keep in mind that the sample is drawn anew from the agent in every step i.e. every new batch with have new SMILES (although there will also be duplicates). That is what you are eventually after. Many thanks, |
Beta Was this translation helpful? Give feedback.
Hi,
many thanks for your interest in REINVENT and welcome to the community!
the loss function is documented in our paper, see eqs. 5, 6 and discussion following eq 7. Please note carefully that the equations there are written with the log likelihood while the TensorBoard output logs the negative log likelihood (NLL) which, confusingly, is a positive number.
Practically thinking, what you want to achieve in reinforcement learning (RL) is that the new model (agent) generates highly scoring SMILES with low NLL. This will, somewhat naturally, mean that the very same SMILES would probably not score well with the prior, which is used as a reference or "regularizer", and generate compounds with …