Skip to content

epochxero/matsci-nlp-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Set up

  1. Make sure you have python3.6 and the pip module installed. We recommend using conda environments.
  2. Navigate to the root folder of this repository (the same folder that contains this README file) and run pip install -r requirements.txt. Note: If you are using a conda env and any packages fail to compile during this step, you may need to first install those packages separately with conda install package_name.
  3. Wait for all the requirements to be downloaded and installed.
  4. Run python setup.py install to install this module. This will also download the Word2vec model files. If the download fails, manually download the model, word embeddings and output embeddings and put them in mat2vec/training/models.
  5. Finalize your chemdataextractor installation by executing cde data download (You may need to restart your virtual environment for the cde command line interface to be found).
  6. You are ready to go!

Processing

Example python usage:

from mat2vec.processing import MaterialsTextProcessor
text_processor = MaterialsTextProcessor()
text_processor.process("LiCoO2 is a battery cathode material.")

(['CoLiO2', 'is', 'a', 'battery', 'cathode', 'material', '.'], [('LiCoO2', 'CoLiO2')])

For the various methods and options see the docstrings in the code.

About

Preprocessing for MatSciNLP project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages