Scripts for scraping posts and comments from Reddit using requests
Python builtin package for retrieving content and Beautifulsoup4
for parsing data from the retrieved content.
I hang around Reddit a lot, and I realized that Reddit is a rich source of text data which can be used for a lot of interesting data related stuff such as sentiment analysis, topic extraction, and others which I haven't really read into yet.
python post_scrape.py --subreddit='MechanicalKeyboards' 'mk.json'
Arguments:
--subreddit (optional) subreddit you want to scrape, defaults to reddit main feed if not specified
--sort_by (optional) sort parameter in reddit, defaults to 'confidence'
--limit (optional) limit of posts to scrape
--verbose (optional) returns the exact json response Reddit returns for every post (default False)
filename (required)
python comment_scrape.py 'https://www.reddit.com/r/MechanicalKeyboards/comments/kz0ayl/first_time_soldering_was_a_success/' 'soldering.json'
Arguments:
--sort_by (optional) sort by top, new, confidence, controversial, old, qa
url (required) url of posts you want to scrape comments from
filename (required)