- Computes embeddings of wikipedia texts.
- Based on UBI-AGML-NLP Embeddings, Hugging Face and BERT.
- Code repository
- Processed data at FTP server
- embeddings-wikipedia.ipynb
(nbviewer)
Jupyter notebook for computation of embeddings - cosine-similarity-wikipedia-a.ipynb
(nbviewer 6e2d52b)
Cosine similarity of means of embeddings - data_access.ipynb
(nbviewer)
Access of pre-computed data (Jupyter version of data_access.py) - cosine-similarity-wikipedia-b.ipynb
(nbviewer 50edd35)
Comparison of cosine-similarity and difference-value, data investigation - cosine-similarity-wikipedia-c.ipynb
(nbviewer)
Further cosine-similarity tests
Data Science Group (DICE) at Paderborn University
Machine Learning Group / CoR-Lab at Bielefeld University
This work has been supported by the German Federal Ministry of Education and Research (BMBF) within the project EML4U under the grant no 01IS19080B.