GitHub - epochxero/matsci-nlp-cleaner: Preprocessing for MatSciNLP project

Set up

Make sure you have python3.6 and the pip module installed. We recommend using conda environments.
Navigate to the root folder of this repository (the same folder that contains this README file) and run pip install -r requirements.txt. Note: If you are using a conda env and any packages fail to compile during this step, you may need to first install those packages separately with conda install package_name.
Wait for all the requirements to be downloaded and installed.
Run python setup.py install to install this module. This will also download the Word2vec model files. If the download fails, manually download the model, word embeddings and output embeddings and put them in mat2vec/training/models.
Finalize your chemdataextractor installation by executing cde data download (You may need to restart your virtual environment for the cde command line interface to be found).
You are ready to go!

Processing

Example python usage:

from mat2vec.processing import MaterialsTextProcessor
text_processor = MaterialsTextProcessor()
text_processor.process("LiCoO2 is a battery cathode material.")

(['CoLiO2', 'is', 'a', 'battery', 'cathode', 'material', '.'], [('LiCoO2', 'CoLiO2')])

For the various methods and options see the docstrings in the code.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
mat2vec		mat2vec
.gitignore		.gitignore
README.md		README.md
TextProcessor.py		TextProcessor.py
langdetecttest.py		langdetecttest.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Set up

Processing

About

Releases

Packages

Contributors 6

Languages

epochxero/matsci-nlp-cleaner

Folders and files

Latest commit

History

Repository files navigation

Set up

Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages