Explaining Drift in Text Data with Document Embeddings

This repository provides a software pipeline in order to explain drift between two sets of documents using embeddings.

Documentation

How to configure file storage and the default directory to read data
Amazon movie reviews
- Data overview
- How to read with Amazon Pickle_Reader and access texts, embeddings, metadata
- How to read with Amazon Pickle_Splitter and get items, which are equally splitted
- Data is currently stored at Google Drive
How to store interim results
How to reduce dimensions
How to create Wordclouds

Goal: Reusable, complete and documented code (good for developers, reviewers, everyone)
If you add new classes, please provide minimal code examples, put them into the doc directory and add a link above.
Directories
- doc: Documentation (e.g. how to read data)
- experiments Jupyter notebooks (e.g. combine class instances into a process generating explanations)
- transformation: Classes for data transformation (e.g. create embeddings, reduce dimensions)
- access: Classes for data access (e.g. read or split embeddings)
- explanations: Classes for the explanation process (e.g. handling ml models, generate explanations)
- scripts: Small sets of commands (e.g. to synchronize repositories)
How to name your code: PEP 8 - Style Guide for Python Code

This work has been supported by the German FederalMinistry of Education and Research (BMBF) within the project EML4U under the grant no 01IS19080B.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
access		access
detection		detection
doc		doc
experiments		experiments
explanations		explanations
explorer		explorer
scripts		scripts
transformation		transformation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md