Skip to content

clevelandhighschoolcs/p8mawpup-nwager

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebScraping MiniProject by Noah Wager

This is a project made by a junior at Cleveland High School. Flowchart at the bottom.

Installation

Make sure you have Python 2.7.10 up to 2.7.14 installed.

Used libraries: requests, pandas, BeautifulSoup4, splinter, time, argparse, ConfigParser.
(all pip installed)

Install the newest version of ChromeDriver for the correct platform here: https://sites.google.com/a/chromium.org/chromedriver/downloads

(ChromeDriver allows the program to control browsers, like opening windows and going to sites).

Extract the .zip file and put this folder:

ChromeDriver Folder

in your Path environmental variable by doing this (Windows):

  1. Start the System Control Panel applet (Search "System" and click - it should be the Control Panel).
  2. Select the Advanced system settings tab.
  3. Click the Environment Variables button.
  4. Under System Variables, select Path, click Edit, then click Browse.
  5. Find the folder (it's probably in Downloads) and put it in the Path (you can type path into the console to check if it's in Path).

Here's a slightly fast gif of that:

How to edit Path

Next, follow these steps:

  1. Install this repository.
  2. Put the requirements.txt and weatherscraper.py files in the folder with your virtual environment (the folder you execute python scripts in).
  3. Set up your virtual environment.
  4. Before running the .py file, type pip install -r requirements.txt into the console to download the required packages for scraping.

You can look at the code here, in GitHub. The file should have a good amount of comments.

Usage

The .py file scrapes the weather forecast from forecast.weather.gov and gets the high and low temperatures, time period, short description, and long description. The program prints any changes it detects. It also arranges all the data into a pandas table (pandas.DataFrame). It also saves the data into a separate INI file in the same folder, so when you come back later it will check if anything has changed.

To run the file without skipping any parameters, you can type this example into the console:
python weatherscraper.py 2 4 seconds "Portland"

It should look something like this:

Weather Scraping Example

The 2 specifies the amount of seconds between each scrape. This can be any number.

The 4 seconds specifies how long the program runs.
You can type any number, and after it you have to write seconds, minutes, or hours to specify the unit of time.

The "Portland" specifies the US city. Type in any US city, but have " quotes around it.

To end the program, either wait for the time to run out or force it to end by pressing ctrl + c.
NOTE: If a window pops up saying "chromedriver.exe has stopped working" or something, that's fine; just click OK.

Optional Parameters
To skip any of the 4 parameters, type after weatherscraper.py and before the numbers:

--skipperiod (skip time period)
--skiptemp (skip temperatures)
--skipdesc (skip long description)
--skipshort_desc (skip short description)

i.e. python weatherscraper.py --skipdesc 2 4 seconds "Portland"

The example should look like this:

Weather Scraping Example - skipdesc

If you got everything to work, then yay! You're done!

Here's the flowchart of the core algorithm:

PaWPUP Flowchart

Special Thanks To

Master Python Web Scraping: Get Your Data Back
How to scrape websites with Python and BeautifulSoup
Python Web Scraping Tutorial using BeautifulSoup
Python Documentation - argparse
google.com
Brian

About

p8mawpup-nwager created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages