Download the Yelp train.csv
(1.21G) and PubMed train.csv
(117MB) from this link or execute:
cd aug-pe
bash scripts/download_data.sh # download yelp train.csv and pubmed train.csv
- Yelp: Processed Yelp dataset from (Yue et al. 2023) with 1.9M reviews for training, 5000 for validation, and 5000 for testing.
- OpenReview: Crawled and processed ICLR 2023 reviews from OpenReview website, with 8396 reviews for training, 2798 for validation, and 2798 for testing.
- PubMed: Abstracts of medical papers in PubMed from 2023/08/01 to 2023/08/07 crawled by (Yu et al. 2023), with 75316 abstracts for training, 14423 for validation, and 4453 for testng.