This code generates structure based feature vectors derived from Robocrystallographer text descriptions
of crystal structures using word embeddings. It uses mat2vec
to learn these embeddings from Robocrystallographer descriptions for all the elements in periodic table.
- Installation
- Reproduction of publication results
- Publications/How to cite
- Mainteiners
Do any one of the following:
- Clone this repository to a directory of your choice on your computer.
- Download an archive of this repository and extract it to a directory of your choice on your computer.
- Make sure you have the pip module installed.
- Navigate to the root folder of this repository and run
pip install --ignore-installed -r requirements.txt
. Note: If you are using a conda environment and any packages fail to compile during this step, you may need to first install those packages separately withconda install package_name
. - You are ready to go!
- To get robocrystallographer description for stable materials from materials project, navigate to the directory
get_embeddings
and runmpid_to_robo_descriptions.py
. - You can train mat2vec on your downloaded corpus of robocrystallographer description and generate word embeddings for all the elements of periodic table.
get_embeddings.py
inget_embeddings
directory contains directions for that. - To download matbench dataset navigate to
training
directory and runget_matbench_data.py
. - You can train ML models on matbench tasks while featurizing the data using our word embedding vectors by running
train_models.py
from the directorytraining
. We used composition based feature vector technique for the featurization. You need to keep the .csv file that contains word embedding in the directorytraining/cbfv/cbfv/element_properties
. - To plot figures as shown in the paper run
get_plots.py
in the same directory. This will save the figures in the directoryresults
.
hasan-sayeed (main maintainer)