Updated citation details

UMassCDS · Feb 9, 2024 · 48817ff · 48817ff
1 parent beeb4e4
commit 48817ff
Showing 1 changed file with 13 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -65,7 +65,19 @@ Run `python app.py --config config.json` to start the application on port 8050,
 The committed `config.json` is configured to load in the best models for each month over a year, from April 2021 through March 2022. To pull the models, run `dvc pull community2vec_models`, assuming you have access to the `s3://ihopmeag` bucket on AWS. See more details on DVC above.
 
 # Citation
-If you use this code, please cite: https://arxiv.org/abs/2309.14259
+If you use this code, please cite [Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit](https://arxiv.org/abs/2309.14259) as
+```
+@misc{partridge2023livestreams,
+      title={Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit}, 
+      author={Virginia Partridge and Jasmine Mangat and Rebecca Curran and Ryan McGrady and Ethan Zuckerman},
+      year={2023},
+      eprint={2309.14259},
+      archivePrefix={arXiv},
+      primaryClass={cs.SI}
+}
+```
+This paper was also accepted at [WebSci24](https://websci24.org) with details forthcoming. 
+
 
 # Known Issues
 - Spark can't read the origial zst compressed files from Pushshift, due to the window size being larger than 27 and I didn't know how to change the Spark/Hadoop settings to fix this (see note in [zstd man page](https://manpages.debian.org/unstable/zstd/zstd.1.en.html) and [Stackoverflow: Read zst to pandas](https://stackoverflow.com/questions/61067762/how-to-extract-zst-files-into-a-pandas-dataframe))). Moreover, if you try to read in large .zst files in Spark, you are limited by memory and if there's not enough, the dataframe just gets filled with `null`. The workaround is re-compress the file as a bzip2 before running `ihop.import_data.py`. This takes a long time, but is simple on the command line and `scripts/export_c2v.sh` is provided as a wrapper for the import.