Skip to content

Commit

Permalink
Updated citation details
Browse files Browse the repository at this point in the history
  • Loading branch information
ginic committed Feb 9, 2024
1 parent beeb4e4 commit 48817ff
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,19 @@ Run `python app.py --config config.json` to start the application on port 8050,
The committed `config.json` is configured to load in the best models for each month over a year, from April 2021 through March 2022. To pull the models, run `dvc pull community2vec_models`, assuming you have access to the `s3://ihopmeag` bucket on AWS. See more details on DVC above.

# Citation
If you use this code, please cite: https://arxiv.org/abs/2309.14259
If you use this code, please cite [Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit](https://arxiv.org/abs/2309.14259) as
```
@misc{partridge2023livestreams,
title={Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit},
author={Virginia Partridge and Jasmine Mangat and Rebecca Curran and Ryan McGrady and Ethan Zuckerman},
year={2023},
eprint={2309.14259},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
```
This paper was also accepted at [WebSci24](https://websci24.org) with details forthcoming.


# Known Issues
- Spark can't read the origial zst compressed files from Pushshift, due to the window size being larger than 27 and I didn't know how to change the Spark/Hadoop settings to fix this (see note in [zstd man page](https://manpages.debian.org/unstable/zstd/zstd.1.en.html) and [Stackoverflow: Read zst to pandas](https://stackoverflow.com/questions/61067762/how-to-extract-zst-files-into-a-pandas-dataframe))). Moreover, if you try to read in large .zst files in Spark, you are limited by memory and if there's not enough, the dataframe just gets filled with `null`. The workaround is re-compress the file as a bzip2 before running `ihop.import_data.py`. This takes a long time, but is simple on the command line and `scripts/export_c2v.sh` is provided as a wrapper for the import.
Expand Down

0 comments on commit 48817ff

Please sign in to comment.