Skip to content

A recommendation system for MOFs leveraging only PXRDs and precursors

License

Notifications You must be signed in to change notification settings

AI4ChemS/XRayPro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: MIT Open in Streamlit

XRayPro

A recommendation system for MOFs leveraging only PXRDs and precursors. For further details, please refer to our paper.

Usage

A demo of the finetuning can be found in src/finetuning.ipynb, in which the model is finetuned to predict methane uptake at HP on ARABG database. For pretraining purposes, please refer to src/pretrain.py. For making predictions with loaded weights, please refer to src/predictions.ipynb. For these two notebooks, please visit the Data Availability section to find and download the data necessary to run these notebooks on your machine. For preprocessing experimental PXRDs, visit src/experimental.ipynb.

Web Application

We have a Streamlit application for XRayPro, available here. If you wish to install the application to run on your machine locally, please visit the web repository for a guide to install.

Hardware Requirements

It is highly encouraged for this package to be used on a GPU (for pretraining, finetuning and evaluating data points). This software was developed and tested on NVIDIA RTX4090, so it is highly recommended to use a GPU along this power. Regarding runtime, for finetuning on 4000 data points (CoRE-MOF), it takes around 5-7 minutes to finetune on 100 epochs, whereas for a larger database such as BW20K (20K entries), it takes around 20 minutes to finetune on 30 epochs. Regarding runtimes on the demonstrations provided (on NVIDIA RTX4090):

  1. predictions.ipynb takes around 20 seconds to fully run (demo done on 100 data points).
  2. finetuning.ipynb takes around 5 minutes to fully run
  3. experimental.ipynb runs nearly instantaneously.

Software requirements

OS Requirements

This package should be working properly on Linux and Windows. In particular, the package has been tested on Ubuntu 22.04.4 LTS.

Installation

Python 3.11.9 is recommended for this package. Furthermore, when pretraining, finetuning and evaluating the model (especially across many MOFs), a GPU is heavily recommended; please do torch.cuda.is_available() in your Python environment/notebook to see if your environment is able to correctly access your GPU (if you have one). For complete use of this package, please follow these steps (assuming you have access to conda):

git clone https://github.com/AI4ChemS/XRayPro.git
conda create -n xraypro python=3.11.9
conda activate xraypro

cd path/to/xraypro
pip install -r requirements.txt

Under the assumption that this is being installed on a fresh environment, the installation time ranges between 2-4 minutes.

Data Availability

If you are interested in reproducing the results in our paper, please download the data available in the following repository: https://zenodo.org/records/14908210

Main

XRayPro is a multimodal model that accepts the powder x-ray diffraction (PXRD) pattern and chemical precursors to make a wide variety of property predictions of metal-organic frameworks (MOFs), while supplementing the most feasible applications per MOF. This is a tool that is motivated by accelerating material discovery. A workflow of our model can be shown below, in which a transformer encodes and embeds the inputted chemical precursor (in the form of the SMILES of the organic linker and metal node), whereas the convolutional neural network (CNN) embeds the PXRD pattern, before performing regression. Furthermore, self-supervised learning (Barlow-Twin) is done on our model against a crystal graph convolutional neural network (CGCNN) to not only improve data efficiency at low data regimes, but also provide more context about the local environment of the MOF. These pretrained weights are loaded into XRayPro and can be finetuned for any task.

Methods

Does this work on any PXRD pattern?

We have evalauted our finetuned model on entries from the Cambridge Structural Database (CSD). As our model was finetuned on CoRE-MOF entries, in which bounded/unbounded solvents are removed from the pores, there was an incentive to assess the robustness on the counterpart entries in which those solvents are still retained when computing the simulated PXRD pattern. This was tested across three different classes: missing hydrogen atoms, bounded and unbounded solvents, showcasing that the model is robust. Furthermore, our model is also robust on experimental PXRD patterns - evaluated on CAU-28, Yb-UiO66 (thank you Ashlee Howarth and team for this data!) and pyrene-based MOFs.

CSDAssessment

Benchmark models

A couple of benchmark models were considered - a descriptor-based ML model (which accepts geometric descriptors and chemistry RACs), CGCNN (which accepts crystal structures) and MOFormer (which accepts MOFids as inputs). It can be seen that our model outperforms MOFormer and CGCNN for geometric properties such as uptake at HP mainly due to the context PXRD patterns provides, alongside competing well against the chemistry-reliant and electronic properties such as band gap. While the descriptor-based ML model generally outperforms XRayPro, the advantage we have is that these descriptors require crystal structures, which are quite challenging to obtain, whereas retrieving the PXRD and chemical precursors are immediately known - ultimately accelerating material discovery. Furthermore, we compete with descriptor-based ML models for geometric properties decently well for this to be sustainable.

The panel on the right shows why we consider multimodality rather than simply using one input. The PXRD and chemical precursor complement well with each other, as the PXRD captures the global structure/environment of the MOF, whereas the chemical precursors describe the metal and organic chemistry. When these two representations are combined, our model is well-rounded and competes with structural models such as CGCNN.

Figure2_v3

Citation

If you use our work, please cite us using the BibTeX entry below.

@article{khan2024connecting,
  title = {Connecting metal-organic framework synthesis to applications with a self-supervised multimodal model},
  author = {Khan, Sartaaj Takrim and Moosavi, Seyed Mohamad},
  year = {2024},
  journal = {ChemRxiv},
  doi = {10.26434/chemrxiv-2024-mq9b4},
  url = {https://chemrxiv.org/engage/chemrxiv/article-details/671a9d9783f22e42140f2df6},
  note = {Preprint, not peer-reviewed}
}

Privacy when using web application

Our web app tool does NOT store any data that is inputted into the entry fields (there is no external database for this).

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

A recommendation system for MOFs leveraging only PXRDs and precursors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published