diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1ee8d0e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,186 @@
+# WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
+
+This repository contains the dataset from the paper "[WikiAsp: A Dataset for Multi-domain Aspect-based Summarization](http://arxiv.org/abs/2011.07832)".
+
+WikiAsp is a multi-domain, aspect-based summarization dataset in the encyclopedic domain.
+In this task, models are asked to summarize *cited reference documents* of a Wikipedia article into aspect-based summaries.
+Each of the 20 domains include 10 domain-specific pre-defined aspects.
+
+
+
+## Dataset
+
+### Download
+
+WikiAsp is a available via 20 zipped archives, each of which corresponds to a domain.
+**More than 28GB of storage space** is necessary to download and store all the domains (unzipped).
+The following command will download all of them and extract archives:
+
+```sh
+./scripts/download_and_extract_all.sh /path/to/save_directory
+```
+Alternatively, one can individually download an archive for each domain from the table below.
+
+
+
+### Format
+
+Each domain includes three files `{train,valid,test}.jsonl`, and each line represents one instance in JSON format.
+Each instance forms the following structure:
+
+```json
+{
+ "exid": "train-1-1",
+ "input": [
+ "tokenized and uncased sentence_1 from document_1",
+ "tokenized and uncased sentence_2 from document_1",
+ "...",
+ "tokenized and uncased sentence_i from document_j",
+ "..."
+ ],
+ "targets": [
+ ["a_1", "tokenized and uncased aspect-based summary for a_1"],
+ ["a_2", "tokenized and uncased aspect-based summary for a_2"],
+ "..."
+ ]
+}
+```
+where,
+* exid: `str`
+* input: `List[str]`
+* targets: `List[Tuple[str,str]]`
+
+Here, `input` is the cited references and consists of tokenized sentences (with NLTK).
+The `targets` key points to a list of aspect-based summaries, where each element is a pair of a) the target aspect and b) the aspect-based summary.
+
+Inheriting from the base [corpus](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wikisum), this dataset exhibits the following characteristics:
+
+* Cited references are composed of multiple documents, but the document boundaries are lost, thus expressed simply in terms of list of sentences.
+* Sentences in the cited references (`input`) are tokenized using NLTK.
+* The number of target summaries for each instance varies.
+
+
+## Citation
+If you use the dataset, please consider citing with
+```
+@article{hayashi2020wikiasp,
+ author = {Hayashi, Hiroaki and Budania, Prashant and Wang, Peng and Ackerson, Chris and Neervannan, Raj and Neubig, Graham},
+ title = {WikiAsp: A Dataset for Multi-domain Aspect-based Summarization},
+ journal = {arXiv preprint arXiv:2011.07832},
+ year = {2020},
+}
+```
+
+## LICENSE
+
+
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
diff --git a/scripts/download_and_extract_all.sh b/scripts/download_and_extract_all.sh
new file mode 100755
index 0000000..b1fbb51
--- /dev/null
+++ b/scripts/download_and_extract_all.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+info() {
+ printf "\r [\033[00;34mINFO\033[0m] %s\n" "$1"
+}
+
+fail() {
+ printf "\r\033[2K [\033[0;31mFAIL\033[0m] %s\n" "$1"
+ echo ''
+ exit
+}
+
+main() {
+ DEST=${1:-wikiasp}
+ PREFIX="https://github.com/rooa/summarization/releases/v1.0"
+
+ mkdir -p "$DEST"
+ info "Saving to $DEST"
+
+ DOMAINS=(Album Animal Artist Building Company EducationalInstitution Event Film Group
+ HistoricPlace Infrastructure MeanOfTransportation OfficeHolder Plant Single
+ SoccerPlayer Software TelevisionShow Town WrittenWork)
+
+ for DOM in "${DOMAINS[@]}"; do
+ TEMP_TARGET="wikiasp_temp_downloaded_${DOM}.tar.bz2"
+ wget -O "${TEMP_TARGET}" "$PREFIX/${DOM}.tar.bz2"
+ if [ ! -e "${TEMP_TARGET}" ]; then
+ fail "Could not download."
+ fi
+ info "Extracting $DOM data..."
+ tar xjvf "${TEMP_TARGET}"
+ mv "${DOM}" "$DEST"
+ rm -f "${TEMP_TARGET}"
+ done
+
+ info "All downloads and extraction are done at $DEST."
+}
+
+main "$@"
diff --git a/wikiasp_task.jpg b/wikiasp_task.jpg
new file mode 100644
index 0000000..5e76248
Binary files /dev/null and b/wikiasp_task.jpg differ