This repository provides the code to run Word Sense Disambiguation using Rotating Spheres Model.
- resources: in this file the training corpora SemCor and PWNGC as well as all evaluation datasets are preprocessed. (Section 5.2.1 in the manuscript)
- statistics: this file stores some statistics on the datasets
- embedding_space: in this file, you'll find the embedding space construction using MPTT and geometric training (Section 4.2 in the manuscript)
- data: this file contains all processed data. It is a very large file, you will find tracker files that indicate in which script each dataset is generated and where it is re-used. Please download all data files from https://drive.google.com/drive/folders/1mPJ127-CfexqijTGRNQj1RR4Z7Fp7pTE?usp=sharing
- encoder: the encoder has been trained and tested on Google Colab. To run the code, please refer to section 'Encoder' of this ReadMe file
- download the requirements.txt file
- If you want to use previously created data, you can download the processed 'data' file from https://drive.google.com/drive/folders/1mPJ127-CfexqijTGRNQj1RR4Z7Fp7pTE?usp=sharing
- If you want to start from the beginning (embedding WordNet into the embedding space), do the following:
- transform WordNet into graph, run <wn2graph.py> to parse each POS group.
- run <mptt.py> on each graph
- run _dataset.py, and generate a seperate dataframe for each POS group
- merge all files together to produce 'Spatial WordNet'
- run the experiments as described in the 'Encoder' below
To reproduce the training procedure:
- download the 'ColabNotebooks' file to your Google Drive
- Link: https://drive.google.com/drive/folders/1YpEY4RsNJEdwYSGfNM7Gib8A9RWqHF-t?usp=sharing
- open the .ipynb notebook
- run the cells
- Check the ouput in the results, and annotations files