Skip to content

Scripts to collect and post climate change stories.

Notifications You must be signed in to change notification settings

climatetree/story-scraper

Repository files navigation

Climate Tree Story Scraper

This repository contains scripts that collect, process, and upload stories about climate change to the stories microservice. It depends on two csv files, one determines the places that will be searched, they are provided in the "/split_place_name_id_csvs" folder, and one "strategy_sector_soluion.csv" that contains the 205 climate change solutions that drive the scraper.

Before starting:

This script depends on the following (install before running):

Python 3

External Libraries: google, webpreview, bson, pymongo

pip install google

pip install webpreview

pip install bson

pip install pymongo
Usage -- Collecting Stories:
python climate_tree_scraper.py your_csv.csv

Input csv must have header place,id, as supplied by the "/split_place_name_id_csvs" folder

Output files will named placeid_storynumber.json in the created output folder. Expect about 5 seconds of runtime per story.

Usage -- Processing and Posting Stories:
python filter_and_combine_stories.py

This filters out bad stories and combines duplicates, placing them in the /filtered_stories folder that will be created if it doesn't exist. Once it is done run upload_stories.py, see note about database connection.

python upload_stories.py

This posts each story in the stories.json file output by filter_and_combine_stories.py

Input Data:

Each file contains 50 rows and is named place_name_id_n.csv where n indicates the file number, lower numbers correspond to more populous places.

Database Connection:

The database URL has been removed for security reasons, update it at the top of upload_stories.py to connect to your database before uploading stories or the upload will fail.

About

Scripts to collect and post climate change stories.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages