This repo contains both Maximum Entropy and Bi-LSTM + CRF algorithm for pos tagging task. In order to run the experiment please follow how to install and how to run sections
The data is for Indonesian corpus that can be accessed and downloaded at https://github.com/famrashel/idn-tagged-corpus. Within this repo, the data have been separated into training and testing data, both of which are ready to be used.
All of these experiments was done in python 3.6. Install the dependencies:
- pytorch==0.4.0
- nltk==3.2.5
- git+https://github.com/pytorch/text#egg=torchtext
- git+https://github.com/pytorch/ignite.git#egg=ignite
- numpy==1.14.3
- sklearn==0.19.1
- pytorch-crf==0.5.0
For mac user, follow these steps to install megam as one of the requirements for running the maxent algorithm
- brew tap brewsci/science
- brew install megam
For linux user, please follow the instructions on https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software#megam-mega-model-optimization-package
- Go to maxent
- Execute
python pos_tagger.py
- Use
python pos_tagger.py --help
to see available options for feature_function_fn
- Go to deep-learning
- Run
python pos_tagger.py