TF-IDF index construction, query split computation, named entity disambiguation / named entity linking
- Calculating TF-IDF for tokens and entities in the documents.
- Split queries into different combinations of tokens and entities.
- Calculate the query score for different combinations in order to select the one with highest score.
- Check the Jupyter file inside to see the specification for part 1.
- Use XGBoost with feature selection to build a model for Named Entity Disambiguation, applying TF-IDF and other NLP methods.
- Train the model in Train.py and test the accuracy with test.py.
- Check the Jupyter file inside to see the specification for part 2.