Outline of the topics/resources for each day of class.
Note, assignments will generally be posted in Canvas. Occassionally we'll use GitHub Classroom for code-related work.
- Course Intro Presentation
- Defining "advanced"
- Demystifying Dot Notation
- High-level discussion on dot notation and Python classes and OOP. This will be the last installment in Python basics, and builds on the foundation laid during fall and winter quarters in the core PADJ courses. You should complete the Demystifying tutorial in advance of Day 2, when there will be an in-class quiz on the material covered.
- Complete the Elections OOP assignment by Friday. This assignment is on GitHub Classroom. See Canvas for detailed instructions.
- Housekeeping (course resources, format, AI policy, etc.)
- Discussion/Q&A on Demystifying Dot Notation and related Elections OOP Coding Challenge (see Canvas assignment for details)
- Quiz on Demystifying tutorial (Canvas)
- Intro to APIs presentation
- Assignments reminder:
- Elections OOP (due Friday)
- Weekly reflection (due Sunday)
- Senate Compromisers (due Monday).
- Hands-on work for remainder of class (student choice):
- Start Quakebot exercise/Senate Compromisers assignment
- Work on Elections OOP Coding Challenge
Please clone the Stanford Data Journalism Notebooks repo.
You can use VS Code to clone or the command line.
git clone [email protected]:stanfordjournalism/data-journalism-notebooks.git
Once the repo is opened locally in VS Code, navigate to content/web_scraping/README.ipynb
and work through all notebooks, in order. There will be a quiz on the material next class.
NOTE: If you're new to browser developer tools, level up using materials for Chrome or Firefox (just pick one) in
content/web_scraping/resources.ipynb
).
- Guest lecture: Katey Rusch, from the UC Berkeley Investigative Reporting Program, on the Community Law Enforcement Accountability Network (CLEAN)
- Overview of
clean-scraper
code architecture and motivations- If you're still shaky on classes and functions, please review Data Journalism Notebook lessons on classes and OOP
- Begin working on clean-scraper - Open source scraper contributions as weekend assignment
- Dissect the website and craft a scraping strategy. Add your proposed strategy to the GitHub issue for the site
- Once your scraping strategy is approved, begin implementing the code on a fork of the
clean-scraper
repo, per the Contributor Guidelines
- Homework:
Guided tour of the clean-scraper code repository, including:
- Code architecture:
cli -> runner -> San Diego PD scraper -> cache.download
- Code conventions:
scrape_meta
stores file artifacts in cache and produces a JSON metadata filescrape
reads JSON metadata and downloads files (to cache)
- Scraping at scale, with a paper trail. - aka why the complexity?
- Contributor Guidelines
- Claim an agency by filing a GitHub Issue
- Dissect your website and add proposed scraping plan to GH Issue
- Start writing your scraper
- Finalize scraping plans (make sure to file an Issue for your scraper on GitHub)
- Work on scrapers
- Data Viz slides - Several short hands-on exercises built in.
- Visualization Curriculum - Provide an overview.
- Status check-ins and code help on
clean-scraper
work
- Guest visit by Cheryl Phillips to discuss a project in her Watchdog Reporting class.
- Finalizing open-source contributions to clean-scraper
- Hands-on work (pot luck):
- Wrap up Viz Curriculum if you haven't done so
- Work on data viz for your own projects/interests
- Begin the data viz assignment for Watchdog Reporting (see Canvas)
- Wrap up
clean-scraper
contribution
- Data Dashboard Design - Guest lecture by Gerald Rich
- Tutorial: Census Population Streamlit Dashboard
Hands-on work for sports charities analysis.
Hands-on work for sports charities analysis, and closing the loop on clean-scraper
.
- Design a multipage Streamlit app for the Sports Charity investigation
- Ramp up on Streamlit, as needed:
- Begin implementing your Streamlit pages for the Sports Charity project (due next week; see Canvas)