Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Web Scraper Fixes #103

Merged
merged 3 commits into from
Dec 11, 2024
Merged

[chore] Web Scraper Fixes #103

merged 3 commits into from
Dec 11, 2024

Conversation

jjstnlee
Copy link
Contributor

@jjstnlee jjstnlee commented Dec 11, 2024

What's new in this PR

Description

updated ores scraper to look for and push these into supabase:

  • renewable energy technology (only works for solar and wind for now)
  • size of project
  • ALL counties and towns as an array
  • booleans for has energy storage/has battery storage
  • storage size if it has energy/battery storage

Screenshots

How to review

  • in api/webscraper/database.py, change supabase_table to the supabase table we want to push the data to
  • uncomment the function calls under "for testing" at the very bottom
  • run python3 api/webscraper/database.py in your terminal and check supabase to make sure data was pushed

Next steps

  • right now the scraper only checks for solar and wind energy types; need to check for all potential energy types in ORES
  • modify NYISO and NYSERDA scrapers to account for the change where county and town are now arrays in supabase
  • there were A LOT of different cases i had to account for when parsing through the descriptions since the formatting was different/some words were there or not there depending on things like if there were multiple towns or multiple counties so lowkey the functions i made MIGHT not work if theres a certain format i forgot to account for/didnt see (i tried very hard to account for all cases but human error :o )

Relevant links

Online sources

Related PRs

CC: @itsliterallymonique

@jjstnlee jjstnlee linked an issue Dec 11, 2024 that may be closed by this pull request
Copy link
Collaborator

@itsliterallymonique itsliterallymonique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check, did you handle geocoding during your sprint? Just because I think the geocoding handling might potentially change since it's in a list rather than a string?
But looking at your subabase database it looks fine so it does not seem to be an issue!

MAKE SURE you make the supabase_table Projects_duplicate instead of your test before you squash & merge!

@itsliterallymonique
Copy link
Collaborator

Just to check, did you handle geocoding during your sprint? Just because I think the geocoding handling might potentially change since it's in a list rather than a string? But looking at your subabase database it looks fine so it does not seem to be an issue!

MAKE SURE you make the supabase_table Projects_duplicate instead of your test before you squash & merge!

SORRY never mind the handling of geocoding!! Deena has handled this in her sprint this week which I just looked at

@itsliterallymonique itsliterallymonique merged commit 0e7a3d5 into main Dec 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[chore] Web Scraper Fixes
2 participants