Skip to content

EML4U/ExplainingDriftTextEmbeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explaining Drift in Text Data with Document Embeddings

This repository provides a software pipeline in order to explain drift between two sets of documents using embeddings.

First experiments indicate that BERT document embeddings outperform Doc2Vec document embeddings.

Documentation

Developer Information

  • Goal: Reusable, complete and documented code (good for developers, reviewers, everyone)
  • If you add new classes, please provide minimal code examples, put them into the doc directory and add a link above.
  • Directories
    • doc: Documentation (e.g. how to read data)
    • experiments Jupyter notebooks (e.g. combine class instances into a process generating explanations)
    • transformation: Classes for data transformation (e.g. create embeddings, reduce dimensions)
    • access: Classes for data access (e.g. read or split embeddings)
    • explanations: Classes for the explanation process (e.g. handling ml models, generate explanations)
    • scripts: Small sets of commands (e.g. to synchronize repositories)
  • How to name your code: PEP 8 - Style Guide for Python Code

Acknowledgments

This work has been supported by the German FederalMinistry of Education and Research (BMBF) within the project EML4U under the grant no 01IS19080B.

About

Explaining Drift in Text Data with Document Embeddings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published