Skip to content

TheGabCode/reddit-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

reddit-scraper

Synopsis

Scripts for scraping posts and comments from Reddit using requests Python builtin package for retrieving content and Beautifulsoup4 for parsing data from the retrieved content.

1. Motivation

I hang around Reddit a lot, and I realized that Reddit is a rich source of text data which can be used for a lot of interesting data related stuff such as sentiment analysis, topic extraction, and others which I haven't really read into yet.

2. Demo for Post Scraping

python post_scrape.py --subreddit='MechanicalKeyboards' 'mk.json'

Arguments:
--subreddit (optional) subreddit you want to scrape, defaults to reddit main feed if not specified
--sort_by (optional) sort parameter in reddit, defaults to 'confidence' --limit (optional) limit of posts to scrape
--verbose (optional) returns the exact json response Reddit returns for every post (default False)
filename (required)

3. Demo for Comment Scraping

python comment_scrape.py 'https://www.reddit.com/r/MechanicalKeyboards/comments/kz0ayl/first_time_soldering_was_a_success/' 'soldering.json'

Arguments:
--sort_by (optional) sort by top, new, confidence, controversial, old, qa
url (required) url of posts you want to scrape comments from
filename (required)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages