AMORE (Amazon Movie Reviews) is a collection of document-based benchmark datasets to compare drift explanation approaches. Each benchmark dataset consists of two sets of unlabeled texts. The single texts are either positive (good rated) or negative (bad rated) movie reviews. The goal of drift explanation approaches is to detect drift between the two sets and explain why drift was detected.
- Set A: 10,000 negative reviews (from 2000 to 2004).
- Set B: 10,000 reviews (from 2005), 90% negative and 10% positive.
- Task: Detect the texts in B which contain drift and explain the drift.
- Note: Drift is mainly based on 3 to 4 words, which only occur in the positive reviews.
- For single results, there are related data files and python classes to read them.
- The data format is JSON (Wikipedia), the files are compressed using gzip (Wikipedia).
- The line number of the original raw file are used as identifiers.
- File: AMORE-TextDuplicates.json.gz
- Format: [number1, number2, ...]
- Example: [1, 5615911]
- Size: 1,239,822 entries
- Code:
- File: AMORE-NumbersYearsStars.json.gz
- Format: [number, year, stars]
- Example: [1, 2007, 3]
- Size: 7,911,684 entries
- Code:
This work has been supported by the German FederalMinistry of Education and Research (BMBF) within the project EML4U under the grant no 01IS19080B.