Add New York Energy use case to populate script #45

annehaley · 2024-07-11T17:44:16Z

Add a new required argument use_case to the populate command. For the boston data, run manage.py populate boston_floods. For the new york data, run manage.py populate new_york_energy. Optional arguments --include_large and --dataset_indexes still work the same way.
Reorganize sample_data folder to group use case data together
Create custom convert_dataset functions for each use case
Implement a module nysdp.py to pull vector features from systemdataportal.nationalgrid.com
Interpret energy grid vector features as networks, export results to GeoJSONs, upload zipped results to DKC for download and import by others

annehaley · 2024-07-29T18:33:23Z

After making modifications to speed up the algorithm which interprets network topology from the NYSDP basic features, running the algorithm across the whole state takes approximately 72 hours. I exported the results to geojsons (one for each county) and uploaded the data to DKC. The zipped results are approximately 200 MB.

Next, I modified the populate script to download the results and save nodes and edges from that data. Running this script takes approximately 90 minutes. The output looks like this:

Output

root@ebabfb6d27e4:/opt/uvdat-server# python manage.py populate new_york_energy --dataset_indexes 3 --include_large
Populating server with sample data for use case new_york_energy...
Creating Dataset objects...
        -  National Grid County Networks
         Converting data for National Grid County Networks...
                Creating network for Albany.
                Created 29517 nodes and 34042 edges.
                Creating network for Allegany.
                Created 2602 nodes and 2879 edges.
                Creating network for Cattaraugus.
                Created 9464 nodes and 10474 edges.
                Creating network for Cayuga.
                Created 330 nodes and 340 edges.
                Creating network for Chautauqua.
                Created 10107 nodes and 11398 edges.
                Creating network for Chenango.
                Created 63 nodes and 68 edges.
                Creating network for Clinton.
                Created 281 nodes and 310 edges.
                Creating network for Columbia.
                Created 6688 nodes and 7607 edges.
                Creating network for Hamilton.
                Created 1919 nodes and 2042 edges.
                Creating network for Herkimer.
                Created 6739 nodes and 7448 edges.
                Creating network for Jefferson.
                Created 15001 nodes and 17266 edges.
                Creating network for Rensselaer.
                Created 15432 nodes and 17358 edges.
                Creating network for Schoharie.
                Created 4892 nodes and 5343 edges.
                Creating network for St Lawrence.
                Created 12094 nodes and 13324 edges.
                Creating network for Warren.
                Created 10983 nodes and 12251 edges.
                Creating network for Washington.
                Created 8142 nodes and 8852 edges.
                Creating network for Wyoming.
                Created 1395 nodes and 1610 edges.
                Creating network for Cortland.
                Created 4011 nodes and 4464 edges.
                Creating network for Dutchess.
                Created 3 nodes and 2 edges.
                Creating network for Erie.
                Created 36508 nodes and 42876 edges.
                Creating network for Essex.
                Created 3564 nodes and 3886 edges.
                Creating network for Franklin.
                Created 5379 nodes and 5991 edges.
                Creating network for Fulton.
                Created 6062 nodes and 6665 edges.
                Creating network for Genesee.
                Created 7155 nodes and 8171 edges.
                Creating network for Lewis.
                Created 5391 nodes and 6160 edges.
                Creating network for Livingston.
                Created 4716 nodes and 5234 edges.
                Creating network for Madison.
                Created 5995 nodes and 6856 edges.
                Creating network for Monroe.
                Created 5533 nodes and 6492 edges.
                Creating network for Montgomery.
                Created 5864 nodes and 6589 edges.
                Creating network for Niagara.
                Created 14404 nodes and 16667 edges.
                Creating network for Oneida.
                Created 18852 nodes and 22381 edges.
                Creating network for Onondaga.
                Created 35068 nodes and 40780 edges.
                Creating network for Ontario.
                Created 1535 nodes and 1648 edges.
                Creating network for Orleans.
                Created 4649 nodes and 5200 edges.
                Creating network for Oswego.
                Created 12962 nodes and 14478 edges.
                Creating network for Otsego.
                Created 1475 nodes and 1598 edges.
                Creating network for Saratoga.
                Created 26271 nodes and 31447 edges.
                Creating network for Schenectady.
                Created 11819 nodes and 13374 edges.
        Completed in 5808.703211 seconds.

In total, 352,865 nodes and 403,571 edges are created.

There is still plenty of room for improvement on the network topology interpretation algorithm. There are still many nodes that act as connectors between only two edges where the line should have been merged, and there are many edges that run parallel to other edges (but this reflects the original features). The current algorithm relies on splitting up the state-wide network into multiple county networks, so inter-county connections are ignored. Speed would be another major area of improvement if this is ever revisited for future work.

sample_data/use_cases/new_york_energy/import_networks.py

sample_data/ingest_sample_data.py

… files

…ach use case

BryonLewis · 2024-08-02T20:05:26Z

sample_data/use_cases/new_york_energy/nysdp.py

+    for feature_set in feature_sets:
+        for set_id, feature_set in feature_set.items():
+            for feature in feature_set:
+                properties = feature['properties']


Just a small idea. Noticed when working on the vector tile querying performance that the amount of data downloaded for the whole state network was quite high (~200MB). Using ST_Simplify reduced that to about 100MB which still seemed a bit large.

Removing the Properties from the vector tiles got it down to ~12.5MB for all lines and points for the state. Seeing as though most of the data on any particular node is empty could we remove it? I want to make sure that for a lot of values it returns an empty string ("") and that doesn't mean something significantly different than not having the value. I'm not sure if internally the vectortile PBF format condenses down the keys, but at least for UI purposes it would be nice to have all those empty keys not show up if they aren't in the data.

Feel free to ignore this comment. I may create a script that does this in EDNT separate from from the uploading the script. That way I can reduce or not include empty properties.

Base you're merging on what Jacob says. I don't have the full context of the UVDAT uses so I was only making comments on the process of loading the data into e-dnt.

Good idea; I think this is worth including in the import script to help reduce the size of the vector features in uvdat, too. It's a quick fix, so I made the change here: e2c7549

BryonLewis · 2024-08-05T00:55:56Z

I believe that the latest updates have broken the substation import (there is no call to the modules anymore). I don't know if you knew this and were planning on further future modifications.

annehaley · 2024-08-05T13:02:01Z

@BryonLewis Thanks for pointing this out; I hadn't tested the refactoring thoroughly. I made changes in ee55020 to call the create_vector_features function manually since we won't use dynamic string imports.

uvdat/core/tasks/networks.py

…tures_from_network`

annehaley changed the base branch from master to osmnx-roads July 11, 2024 17:44

Base automatically changed from osmnx-roads to master July 16, 2024 20:30

annehaley marked this pull request as ready for review July 29, 2024 19:28

annehaley requested review from jjnesbitt and BryonLewis July 30, 2024 20:05

BryonLewis reviewed Jul 31, 2024

View reviewed changes

sample_data/use_cases/new_york_energy/import_networks.py Outdated Show resolved Hide resolved

jjnesbitt requested changes Jul 31, 2024

View reviewed changes

sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved

sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved

sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved

sample_data/ingest_sample_data.py Outdated Show resolved Hide resolved

annehaley mentioned this pull request Aug 2, 2024

Networks & File Items #46

Merged

annehaley force-pushed the use_cases branch from d9468e0 to 8e37a98 Compare August 2, 2024 16:28

annehaley changed the base branch from master to model-changes August 2, 2024 16:28

annehaley force-pushed the model-changes branch from 07483b8 to 48fe642 Compare August 2, 2024 18:21

annehaley added 13 commits August 2, 2024 18:45

Reorganize sample_data folder to accomodate multiple use cases

cfa32a8

Add module-based dataset loading

48e0a22

Add a function to consolidate nysdp features

d499531

wip: interpret network from vector features

0257833

fix: Improve speed and accuracy of network interpretation algorithm

29650e7

fix: Small bug fixes

478f58a

feat: Add import script to load county networks from exported geojson…

0bb90a4

… files

style: Reformat with tox

0f985c9

fix: only delete old map layers at beginning of dataset ingest

d71c90a

style: Additional style fixes

3f18006

fix: network.dataset -> dataset

f859588

refactor: remove unnecessary string casting

fb2cb07

refactor: create ingest modules with convert_dataset function for e…

48a769d

…ach use case

annehaley force-pushed the use_cases branch from 5158311 to 48a769d Compare August 2, 2024 18:48

annehaley added 2 commits August 2, 2024 19:01

fix: ingest contexts before charts

df98b5d

test: adjust expected number of contexts in populate test

11c2c83

BryonLewis reviewed Aug 2, 2024

View reviewed changes

annehaley added 2 commits August 5, 2024 12:54

fix: remove other usages of module reference for nysdp datasets

ee55020

fix: filter properties upon import (key and value must exist)

e2c7549

fix: remove unintentional quotes

296dc78

Base automatically changed from model-changes to master August 21, 2024 12:15

jjnesbitt requested changes Aug 21, 2024

View reviewed changes

uvdat/core/tasks/networks.py Outdated Show resolved Hide resolved

uvdat/core/tasks/networks.py Outdated Show resolved Hide resolved

refactor: Rename vector_features_from_network -> `create_vector_fea…

a287c78

…tures_from_network`

jjnesbitt approved these changes Aug 21, 2024

View reviewed changes

annehaley merged commit 221fdf2 into master Aug 21, 2024
4 checks passed

annehaley deleted the use_cases branch August 21, 2024 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add New York Energy use case to populate script #45

Add New York Energy use case to populate script #45

annehaley commented Jul 11, 2024 •

edited

Loading

annehaley commented Jul 29, 2024 •

edited

Loading

BryonLewis Aug 2, 2024

BryonLewis Aug 4, 2024

annehaley Aug 5, 2024

BryonLewis commented Aug 5, 2024

annehaley commented Aug 5, 2024 •

edited

Loading

Add New York Energy use case to populate script #45

Add New York Energy use case to populate script #45

Conversation

annehaley commented Jul 11, 2024 • edited Loading

annehaley commented Jul 29, 2024 • edited Loading

BryonLewis Aug 2, 2024

Choose a reason for hiding this comment

BryonLewis Aug 4, 2024

Choose a reason for hiding this comment

annehaley Aug 5, 2024

Choose a reason for hiding this comment

BryonLewis commented Aug 5, 2024

annehaley commented Aug 5, 2024 • edited Loading

annehaley commented Jul 11, 2024 •

edited

Loading

annehaley commented Jul 29, 2024 •

edited

Loading

annehaley commented Aug 5, 2024 •

edited

Loading