-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add New York Energy use case to populate script #45
Conversation
After making modifications to speed up the algorithm which interprets network topology from the NYSDP basic features, running the algorithm across the whole state takes approximately 72 hours. I exported the results to geojsons (one for each county) and uploaded the data to DKC. The zipped results are approximately 200 MB. Next, I modified the populate script to download the results and save nodes and edges from that data. Running this script takes approximately 90 minutes. The output looks like this: Output
In total, 352,865 nodes and 403,571 edges are created. There is still plenty of room for improvement on the network topology interpretation algorithm. There are still many nodes that act as connectors between only two edges where the line should have been merged, and there are many edges that run parallel to other edges (but this reflects the original features). The current algorithm relies on splitting up the state-wide network into multiple county networks, so inter-county connections are ignored. Speed would be another major area of improvement if this is ever revisited for future work. |
07483b8
to
48fe642
Compare
for feature_set in feature_sets: | ||
for set_id, feature_set in feature_set.items(): | ||
for feature in feature_set: | ||
properties = feature['properties'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small idea. Noticed when working on the vector tile querying performance that the amount of data downloaded for the whole state network was quite high (~200MB). Using ST_Simplify reduced that to about 100MB which still seemed a bit large.
Removing the Properties from the vector tiles got it down to ~12.5MB for all lines and points for the state. Seeing as though most of the data on any particular node is empty could we remove it? I want to make sure that for a lot of values it returns an empty string (""
) and that doesn't mean something significantly different than not having the value. I'm not sure if internally the vectortile PBF format condenses down the keys, but at least for UI purposes it would be nice to have all those empty keys not show up if they aren't in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore this comment. I may create a script that does this in EDNT separate from from the uploading the script. That way I can reduce or not include empty properties.
Base you're merging on what Jacob says. I don't have the full context of the UVDAT uses so I was only making comments on the process of loading the data into e-dnt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea; I think this is worth including in the import script to help reduce the size of the vector features in uvdat, too. It's a quick fix, so I made the change here: e2c7549
I believe that the latest updates have broken the substation import (there is no call to the modules anymore). I don't know if you knew this and were planning on further future modifications. |
@BryonLewis Thanks for pointing this out; I hadn't tested the refactoring thoroughly. I made changes in ee55020 to call the |
…tures_from_network`
use_case
to the populate command. For the boston data, runmanage.py populate boston_floods
. For the new york data, runmanage.py populate new_york_energy
. Optional arguments--include_large
and--dataset_indexes
still work the same way.sample_data
folder to group use case data togetherconvert_dataset
functions for each use casenysdp.py
to pull vector features fromsystemdataportal.nationalgrid.com