Skip to content

Files

Latest commit

3df7436 · Mar 14, 2024

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Mar 11, 2024
Mar 11, 2024
Mar 11, 2024
Mar 14, 2024

Dataset Preparation

Download the Yelp train.csv (1.21G) and PubMed train.csv (117MB) from this link or execute:

cd aug-pe
bash scripts/download_data.sh # download yelp train.csv and pubmed train.csv

Dataset Description:

  • Yelp: Processed Yelp dataset from (Yue et al. 2023) with 1.9M reviews for training, 5000 for validation, and 5000 for testing.
  • OpenReview: Crawled and processed ICLR 2023 reviews from OpenReview website, with 8396 reviews for training, 2798 for validation, and 2798 for testing.
  • PubMed: Abstracts of medical papers in PubMed from 2023/08/01 to 2023/08/07 crawled by (Yu et al. 2023), with 75316 abstracts for training, 14423 for validation, and 4453 for testng.