This repository hosts projects developed by the Liverpool Materials team for the LLMs for Materials and Chemistry hackathon.
Deep learning models have demonstrated remarkable capabilities in predicting material properties, owing to their specific inductive biases. Nevertheless, the limited size of chemical datasets presents substantial challenges for leveraging these architectures effectively.
Here, we experiment with a hybrid approach that aims at integrating the architectural biases of deep learning alongside the abstract knowledge provided by LLMs. Specifically, we utilize Roost (https://www.nature.com/articles/s41467-020-19964-7), an attentional graph neural network for material property prediction that only leverages the stoichiometry of the underlying materials. We aggregate the material representations created by Roost with context information provided by the last layer of MatBert/MatSciBert (https://www.sciencedirect.com/science/article/pii/S2666389922000733 , https://www.nature.com/articles/s41524-022-00784-w).
Here, you can find a concise overview of our project.
In this repository, we release the discussed Example 2 in the report, as we believe it is the most interesting one.
First, you need to download the Lithium-ion conductors dataset, that you can find here: http://pcwww.liv.ac.uk/~msd30/lmds/LiIonDatabase.html. You can then use llmroost_v2/assets/liion_preprocess.ipynb
to preprocess the dataset and store it in llmroost_v2/datasets
folder (use LiIon_roomtemp_family.xlsx
as naming convention). Then, you can use llmroost_v2/assets/lookup_table_liion.ipynb
to create a lookup table via ElM2D that will be used subsequently.
Create a new conda
environment using env.yml
via:
conda env create -f llmroost_v2/env.yml
Rename the .env.template
file in llmroost_v2/
to .env
by specifying the corresponding path directories.
To run the baseline Roost model:
python llmroost_v2/run.py +model=roost ++model.agg_type=none
To run LLMRoost(MatBert):
python llmroost_v2/run.py +model=llmroost ++model.agg_type=sum,concat
You can utilize MatSciBert instead, by modifying the corresponding llm name in llmroost_v2/conf/model/llmroost.yaml
from matbert
to matscibert
in defaults
.