This project:
- Collects articles about women from the New York Times and Washington Post, 1980-2014
- Categorizes each article by country + region
- Uses Stanford's Named Entity Recognizer to remove proper nouns from article texts
- Uses STM (R package) to analyze topical trends in the corpus over time and across region
- Compare coverage across region using word separating alogrithms and other techniques.
- Conducts statistical analysis regressing number of documents and mean topic distributions on country level variables (note the country level dataset is not included in this repo)