Skip to content

biglocalnews/scraping-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping Workshop

This is a workshop that teaches how to use Python to create a dataset by scraping a website.

This entails parsing HTML, downloading PDFs, and extracting data from PDFs.

Installation

Install JupyterLab if necessary (you can use a virtual environment). I set this up with Python3.10.

pip install -r requirements.txt

You can then run the jupyter-lab server.

Running the workshop

Just open the notebook in JupyterLab, it explains everything.

Backups

It's possible the source website will change or disappear entirely. It's archived in the bak/web directory. All the PDFs that should be downloaded are in bak/raw. A sample "final product" CSV is also included in the bak/data directory.

About

Tutorial on web scraping with Python and PDFs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published