Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add New York Energy use case to populate script #45

Merged
merged 19 commits into from
Aug 21, 2024
Merged

Conversation

annehaley
Copy link
Collaborator

@annehaley annehaley commented Jul 11, 2024

  • Add a new required argument use_case to the populate command. For the boston data, run manage.py populate boston_floods. For the new york data, run manage.py populate new_york_energy. Optional arguments --include_large and --dataset_indexes still work the same way.
  • Reorganize sample_data folder to group use case data together
  • Create custom convert_dataset functions for each use case
  • Implement a module nysdp.py to pull vector features from systemdataportal.nationalgrid.com
  • Interpret energy grid vector features as networks, export results to GeoJSONs, upload zipped results to DKC for download and import by others

@annehaley annehaley changed the base branch from master to osmnx-roads July 11, 2024 17:44
Base automatically changed from osmnx-roads to master July 16, 2024 20:30
@annehaley
Copy link
Collaborator Author

annehaley commented Jul 29, 2024

After making modifications to speed up the algorithm which interprets network topology from the NYSDP basic features, running the algorithm across the whole state takes approximately 72 hours. I exported the results to geojsons (one for each county) and uploaded the data to DKC. The zipped results are approximately 200 MB.

Next, I modified the populate script to download the results and save nodes and edges from that data. Running this script takes approximately 90 minutes. The output looks like this:

Output

root@ebabfb6d27e4:/opt/uvdat-server# python manage.py populate new_york_energy --dataset_indexes 3 --include_large
Populating server with sample data for use case new_york_energy...
Creating Dataset objects...
        -  National Grid County Networks
         Converting data for National Grid County Networks...
                Creating network for Albany.
                Created 29517 nodes and 34042 edges.
                Creating network for Allegany.
                Created 2602 nodes and 2879 edges.
                Creating network for Cattaraugus.
                Created 9464 nodes and 10474 edges.
                Creating network for Cayuga.
                Created 330 nodes and 340 edges.
                Creating network for Chautauqua.
                Created 10107 nodes and 11398 edges.
                Creating network for Chenango.
                Created 63 nodes and 68 edges.
                Creating network for Clinton.
                Created 281 nodes and 310 edges.
                Creating network for Columbia.
                Created 6688 nodes and 7607 edges.
                Creating network for Hamilton.
                Created 1919 nodes and 2042 edges.
                Creating network for Herkimer.
                Created 6739 nodes and 7448 edges.
                Creating network for Jefferson.
                Created 15001 nodes and 17266 edges.
                Creating network for Rensselaer.
                Created 15432 nodes and 17358 edges.
                Creating network for Schoharie.
                Created 4892 nodes and 5343 edges.
                Creating network for St Lawrence.
                Created 12094 nodes and 13324 edges.
                Creating network for Warren.
                Created 10983 nodes and 12251 edges.
                Creating network for Washington.
                Created 8142 nodes and 8852 edges.
                Creating network for Wyoming.
                Created 1395 nodes and 1610 edges.
                Creating network for Cortland.
                Created 4011 nodes and 4464 edges.
                Creating network for Dutchess.
                Created 3 nodes and 2 edges.
                Creating network for Erie.
                Created 36508 nodes and 42876 edges.
                Creating network for Essex.
                Created 3564 nodes and 3886 edges.
                Creating network for Franklin.
                Created 5379 nodes and 5991 edges.
                Creating network for Fulton.
                Created 6062 nodes and 6665 edges.
                Creating network for Genesee.
                Created 7155 nodes and 8171 edges.
                Creating network for Lewis.
                Created 5391 nodes and 6160 edges.
                Creating network for Livingston.
                Created 4716 nodes and 5234 edges.
                Creating network for Madison.
                Created 5995 nodes and 6856 edges.
                Creating network for Monroe.
                Created 5533 nodes and 6492 edges.
                Creating network for Montgomery.
                Created 5864 nodes and 6589 edges.
                Creating network for Niagara.
                Created 14404 nodes and 16667 edges.
                Creating network for Oneida.
                Created 18852 nodes and 22381 edges.
                Creating network for Onondaga.
                Created 35068 nodes and 40780 edges.
                Creating network for Ontario.
                Created 1535 nodes and 1648 edges.
                Creating network for Orleans.
                Created 4649 nodes and 5200 edges.
                Creating network for Oswego.
                Created 12962 nodes and 14478 edges.
                Creating network for Otsego.
                Created 1475 nodes and 1598 edges.
                Creating network for Saratoga.
                Created 26271 nodes and 31447 edges.
                Creating network for Schenectady.
                Created 11819 nodes and 13374 edges.
        Completed in 5808.703211 seconds.

In total, 352,865 nodes and 403,571 edges are created.

There is still plenty of room for improvement on the network topology interpretation algorithm. There are still many nodes that act as connectors between only two edges where the line should have been merged, and there are many edges that run parallel to other edges (but this reflects the original features). The current algorithm relies on splitting up the state-wide network into multiple county networks, so inter-county connections are ignored. Speed would be another major area of improvement if this is ever revisited for future work.

@annehaley annehaley marked this pull request as ready for review July 29, 2024 19:28
sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved
sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved
sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved
sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved
@annehaley annehaley mentioned this pull request Aug 2, 2024
@annehaley annehaley changed the base branch from master to model-changes August 2, 2024 16:28
for feature_set in feature_sets:
for set_id, feature_set in feature_set.items():
for feature in feature_set:
properties = feature['properties']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small idea. Noticed when working on the vector tile querying performance that the amount of data downloaded for the whole state network was quite high (~200MB). Using ST_Simplify reduced that to about 100MB which still seemed a bit large.

Removing the Properties from the vector tiles got it down to ~12.5MB for all lines and points for the state. Seeing as though most of the data on any particular node is empty could we remove it? I want to make sure that for a lot of values it returns an empty string ("") and that doesn't mean something significantly different than not having the value. I'm not sure if internally the vectortile PBF format condenses down the keys, but at least for UI purposes it would be nice to have all those empty keys not show up if they aren't in the data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore this comment. I may create a script that does this in EDNT separate from from the uploading the script. That way I can reduce or not include empty properties.

Base you're merging on what Jacob says. I don't have the full context of the UVDAT uses so I was only making comments on the process of loading the data into e-dnt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; I think this is worth including in the import script to help reduce the size of the vector features in uvdat, too. It's a quick fix, so I made the change here: e2c7549

@BryonLewis
Copy link

I believe that the latest updates have broken the substation import (there is no call to the modules anymore). I don't know if you knew this and were planning on further future modifications.

@annehaley
Copy link
Collaborator Author

annehaley commented Aug 5, 2024

@BryonLewis Thanks for pointing this out; I hadn't tested the refactoring thoroughly. I made changes in ee55020 to call the create_vector_features function manually since we won't use dynamic string imports.

Base automatically changed from model-changes to master August 21, 2024 12:15
uvdat/core/tasks/networks.py Outdated Show resolved Hide resolved
uvdat/core/tasks/networks.py Outdated Show resolved Hide resolved
@annehaley annehaley merged commit 221fdf2 into master Aug 21, 2024
4 checks passed
@annehaley annehaley deleted the use_cases branch August 21, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants