Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

26 thp v2 arrow POC #30

Draft
wants to merge 4 commits into
base: 26-THP_v2
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions cbc/arrow_performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Performance

## ARROW micro

```
chrisbc@tryharder-ubuntu:/GNSDATA/LIB/toshi-hazard-post$ poetry run python ./scripts/thp_v2.py aggregate demo/hazard_v2_micro.toml -M ARROW
warning openquake module dependency not available, maybe you want to install
with nzshm-model[openquake]
Toshi Hazard Post: hazard curve aggregation ARROW
=================================================
2024-04-17 21:06:04,776 - toshi_hazard_post.version2.aggregation_arrow - INFO - getting sites . . .
2024-04-17 21:06:04,777 - toshi_hazard_post.version2.aggregation_arrow - INFO - getting logic trees . . .
2024-04-17 21:06:04,779 - toshi_hazard_post.version2.aggregation_arrow - INFO - building hazard logic tree . . .
2024-04-17 21:06:04,779 - toshi_hazard_post.version2.aggregation_arrow - INFO - arrow method
2024-04-17 21:06:04,780 - toshi_hazard_post.version2.aggregation_arrow - INFO - time to build weight table 0.00 seconds
2024-04-17 21:06:04,780 - toshi_hazard_post.version2.aggregation_arrow - DEBUG - (96, 3)
2024-04-17 21:06:04,896 - toshi_hazard_post.version2.data_arrow - INFO - load ds: 0.000616, scanner:0.000213 duck_sql:0.0: to_arrow 0.109467
2024-04-17 21:06:04,898 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to load realizations 0.11 seconds
2024-04-17 21:06:05,315 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to convert_probs_to_rates() 0.42 seconds
2024-04-17 21:06:05,332 - toshi_hazard_post.version2.aggregation_calc_arrow - INFO - RSS: 0MB
2024-04-17 21:06:05,337 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to join_rates_weights() 0.02 seconds
2024-04-17 21:06:05,337 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - rates_weights (96, 4)
2024-04-17 21:06:05,337 - toshi_hazard_post.version2.aggregation_arrow - INFO - time to perform aggregation for one location-imt pair 0.55 seconds
2024-04-17 21:06:05,338 - toshi_hazard_post.version2.aggregation_arrow - INFO - total arrow time: 0.558
```

## ARROW mini

```
chrisbc@tryharder-ubuntu:/GNSDATA/LIB/toshi-hazard-post$ poetry run python ./scripts/thp_v2.py aggregate demo/hazard_v2_mini.toml -M ARROW
warning openquake module dependency not available, maybe you want to install
with nzshm-model[openquake]
Toshi Hazard Post: hazard curve aggregation ARROW
=================================================
2024-04-17 21:08:27,321 - toshi_hazard_post.version2.aggregation_arrow - INFO - time to perform aggregation for one location-imt pair 0.50 seconds
2024-04-17 21:08:27,321 - toshi_hazard_post.version2.aggregation_arrow - INFO - total arrow time: 0.747
```

## ARROW NSHM

```
chrisbc@tryharder-ubuntu:/GNSDATA/LIB/toshi-hazard-post$ poetry run python ./scripts/thp_v2.py aggregate demo/hazard_v2.toml -M ARROW
warning openquake module dependency not available, maybe you want to install
with nzshm-model[openquake]
Toshi Hazard Post: hazard curve aggregation ARROW
=================================================
2024-04-17 21:09:39,864 - toshi_hazard_post.version2.aggregation_arrow - INFO - time to build weight table 18.28 seconds
2024-04-17 21:09:39,864 - toshi_hazard_post.version2.aggregation_arrow - DEBUG - (3919104, 3)
2024-04-17 21:09:40,010 - toshi_hazard_post.version2.aggregation_arrow - INFO - RSS: 149MB
2024-04-17 21:09:40,155 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to load realizations 0.14 seconds
2024-04-17 21:09:40,155 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - rlz_table (912, 3)
2024-04-17 21:09:40,547 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to convert_probs_to_rates() 0.39 seconds
2024-04-17 21:09:41,970 - toshi_hazard_post.version2.aggregation_calc_arrow - INFO - rates_weights_joined shape: (3919104, 4)
2024-04-17 21:09:43,855 - toshi_hazard_post.version2.aggregation_calc_arrow - INFO - RSS: 149MB
2024-04-17 21:09:43,866 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - time to join_rates_weights() 3.32 seconds
2024-04-17 21:09:43,866 - toshi_hazard_post.version2.aggregation_calc_arrow - DEBUG - rates_weights (3919104, 4)
2024-04-17 21:09:43,928 - toshi_hazard_post.version2.aggregation_arrow - INFO - time to perform aggregation for one location-imt pair 3.92 seconds
2024-04-17 21:09:43,928 - toshi_hazard_post.version2.aggregation_arrow - INFO - total arrow time: 22.343
```

## Original NSHM

```
chrisbc@tryharder-ubuntu:/GNSDATA/LIB/toshi-hazard-post$ poetry run python ./scripts/thp_v2.py aggregate demo/hazard_v2.toml -M OG
warning openquake module dependency not available, maybe you want to install
with nzshm-model[openquake]
Toshi Hazard Post: hazard curve aggregation OG
==============================================
2024-04-17 21:10:52,479 - toshi_hazard_post.version2.aggregation - INFO - time to calculate weights 9.14 seconds
979776
2024-04-17 21:10:56,043 - toshi_hazard_post.version2.data - INFO - loaded 912 realizations and 912 entries
2024-04-17 21:10:56,043 - toshi_hazard_post.version2.aggregation_calc - DEBUG - time to load realizations 3.56 seconds
2024-04-17 21:11:21,688 - toshi_hazard_post.version2.aggregation_calc - DEBUG - time to build branch rates 25.65 seconds
2024-04-17 21:11:21,688 - toshi_hazard_post.version2.aggregation_calc - DEBUG - branch_rates with shape (979776, 44)
2024-04-17 21:11:21,688 - toshi_hazard_post.version2.aggregation_calc - DEBUG - weights with shape (979776,)
2024-04-17 21:11:26,251 - toshi_hazard_post.version2.aggregation_calc - DEBUG - agg with shape (44, 6)
2024-04-17 21:11:26,251 - toshi_hazard_post.version2.aggregation_calc - DEBUG - time to calculate aggs 4.56 seconds
2024-04-17 21:11:26,251 - toshi_hazard_post.version2.aggregation_calc - INFO - saving result . . .
2024-04-17 21:11:26,389 - toshi_hazard_post.version2.aggregation - INFO - time to perform aggregation for one location-imt pair 33.91 seconds
2024-04-17 21:11:26,390 - toshi_hazard_post.version2.aggregation - INFO - total OG time: 43.052995
chrisbc@tryharder-ubuntu:/GNSDATA/LIB/toshi-hazard-post$
```

78 changes: 78 additions & 0 deletions cbc/inspection_0.ipy
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# coding: utf-8
from toshi_hazard_post.version2.aggregation_config import AggregationConfig
config = AggregationConfig('demo/hazard_v2.toml')
config
help(config)
import pyarrow as pa
from toshi_hazard_post.version2.aggregation_config import AggregationConfig
from toshi_hazard_post.version2.aggregation_calc_arrow import calc_aggregation_arrow
from toshi_hazard_post.version2.logic_tree import HazardLogicTree
from toshi_hazard_post.version2.aggregation_setup import get_lts, get_sites
sites = get_sites(config.locations, config.vs30s)
sites
srm_lt, gmcm_lt = get_lts(config)
srm_lt
logic_tree = HazardLogicTree(srm_lt, gmcm_lt)
logic_tree
help(logic_tree)
help(logic_tree.component_branches)
len(logic_tree.component_branches)
len(list(logic_tree.component_branches))
list(logic_tree.component_branches)[0]
list(logic_tree.component_branches)[0].tectonic_region_type
list(logic_tree.component_branches)[0].source_branch.tectonic_region_types
list(logic_tree.component_branches)[1].source_branch.tectonic_region_types
list(logic_tree.component_branches)[2].source_branch.tectonic_region_types
list(logic_tree.component_branches)[-1].source_branch.tectonic_region_types
list(logic_tree.component_branches)[12].source_branch.tectonic_region_types
for cpb in logic_tree.component_branches:
print(cpb.source_branch.tectonic_region_types, "|", len(cpb.gmcm_branches), cpb.gmcm_branches[0].tectonic_region_type)

help
help()
for cpb in logic_tree.component_branches:
print(cpb.source_branch.tectonic_region_types, "|", len(cpb.gmcm_branches), cpb.gmcm_branches[0].tectonic_region_type)

len(logic_tree.composite_branches)
len(list(logic_tree.composite_branches))
list(logic_tree.composite_branches)[0]
len(list(logic_tree.composite_branches)[0].branches)
for cpb in list(logic_tree.composite_branches)[0].branches:
print(cpb.source_branch.tectonic_region_types, "|", len(cpb.gmcm_branches), cpb.gmcm_branches[0].tectonic_region_type)

weight_table = logic_tree.weight_table()
wt = weight_table.to_pandas()
wt
wt[(wt.sources_digest == "af9ec2b004d7 & wt.gmms_digest == "e031e948959c")]
wt[(wt.sources_digest == "af9ec2b004d7" & wt.gmms_digest == "e031e948959c")]
wt[(wt.sources_digest == "af9ec2b004d7") & (wt.gmms_digest == "e031e948959c")]
wt[(wt.sources_digest == "af9ec2b004d7")]
wt[(wt.gmms_digest == "e031e948959c")]
config = AggregationConfig('demo/hazard_v2.toml')
srm_lt, gmcm_lt = get_lts(config)
weight_table = logic_tree.weight_table()
wt = weight_table.to_pandas()
wt
wt = weight_table.to_pandas()
weight_table = logic_tree.weight_table()
wt = weight_table.to_pandas()
wt
logic_tree.weights
len(logic_tree.weights)
config = AggregationConfig('demo/hazard_v2.toml')
srm_lt, gmcm_lt = get_lts(config)
gmcm_lt
config = AggregationConfig('demo/hazard_v2.toml')
srm_lt, gmcm_lt = get_lts(config)
gmcm_lt
logic_tree = HazardLogicTree(srm_lt, gmcm_lt)
len(logic_tree.weights

)
config = AggregationConfig('demo/hazard_v2.toml')
srm_lt, gmcm_lt = get_lts(config)
logic_tree = HazardLogicTree(srm_lt, gmcm_lt)
len(logic_tree.weights

)
%save -r inspection_0 1-10000
40 changes: 40 additions & 0 deletions cbc/manip_0.ipy
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# coding: utf-8
from toshi_hazard_post.version2.aggregation_config import AggregationConfig
config = AggregationConfig('demo/hazard_v2.toml')
import pyarrow as pa
from toshi_hazard_post.version2.aggregation_config import AggregationConfig
from toshi_hazard_post.version2.aggregation_calc_arrow import calc_aggregation_arrow
from toshi_hazard_post.version2.logic_tree import HazardLogicTree
from toshi_hazard_post.version2.aggregation_setup import get_lts, get_sites
config = AggregationConfig('demo/hazard_v2.toml')
srm_lt, gmcm_lt = get_lts(config)
logic_tree = HazardLogicTree(srm_lt, gmcm_lt)
sites = get_sites(config.locations, config.vs30s)
weight_table = logic_tree.weight_table()
rates_weights = calc_aggregation_arrow(
site=sites[0], imt=config.imts[0], agg_types = config.agg_types, weights=weight_table,
logic_tree = logic_tree, compatibility_key=config.compat_key, hazard_model_id=config.hazard_model_id
)
rates_weights
rwt = rates_weights
rwt.group_by(['sources_digist', 'gmms_digest'])
rwt.group_by(['sources_digist', 'gmms_digest']).aggregate(('weight', 'sum'))
rwt.group_by(['sources_digist', 'gmms_digest']).aggregate([('weight', 'sum')])
rwt.group_by(['sources_digest', 'gmms_digest']).aggregate([('weight', 'sum')])
rwt.group_by(['sources_digest', 'gmms_digest']).aggregate([('weight', 'sum')]).shape
rwt.shape
rwt.group_by(['sources_digest', 'gmms_digest']).aggregate([('weight', 'sum'), ('values', 'sum')])
rwt.values
help(rwt)
rwt.schema
rwt.column(2)
help(rwt.column(2))
vnp = rwt.column(2).to_numpy()
vnp
vnp.shape
vnp[0].shape
summed = np.sum(vnp, axis=0)
import numpy as np
summed = np.sum(vnp, axis=0)
summed
summed.shape
11 changes: 8 additions & 3 deletions demo/hazard_v2.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,23 @@ hazard_model_id = "DEMO_MODEL"
model_version = "NSHM_v1.0.4"

# alternativly, specify a path to logic tree files
# srm_file = "demo/srm_logic_tree.json"
# srm_file = "demo/srm_logic_tree_no_slab.json"
# srm_file = "demo/srm_logic_tree_micro.json"
# gmcm_file = "demo/gmcm_logic_tree_medium.json"
# gmcm_file = "demo/gmcm_logic_tree_small.json"

[site]
vs30s = [275]
# locations = ["WLG", "SRWG214", "-41.000~174.700", "myfile.csv"]
locations = ["-34.500~173.000"]
# locations = ["NZ"]
# locations = ["MRO"]

# alternativly specify a file with locations and vs30 values
# site_file = "sites.csv"

[calculation]
# imts = ["PGA", "SA(1.0)"]
imts = ["PGA"]
#imts = ["PGA", "SA(1.0)"]
#imts = ["PGA"]
imts = ["PGA", "SA(0.5)", "SA(1.0)", "SA(2.0)", "SA(5.0)"]
agg_types = ["mean", "cov", "std", "0.005", "0.01", "0.025"]
25 changes: 25 additions & 0 deletions demo/hazard_v2_micro.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[general]
compatibility_key = "A_A"
hazard_model_id = "DEMO_MODEL"

[logic_trees]
# model_version = "NSHM_v1.0.4"

# alternativly, specify a path to logic tree files
srm_file = "demo/srm_logic_tree_micro.json"
gmcm_file = "demo/gmcm_logic_tree_small.json"

[site]
vs30s = [275]
# locations = ["WLG", "SRWG214", "-41.000~174.700", "myfile.csv"]
locations = ["-34.500~173.000"]
# locations = ["NZ"]
# locations = ["MRO"]

# alternativly specify a file with locations and vs30 values
# site_file = "sites.csv"

[calculation]
#imts = ["PGA", "SA(1.0)"]
imts = ["PGA"]
agg_types = ["mean", "cov", "std", "0.005", "0.01", "0.025"]
25 changes: 25 additions & 0 deletions demo/hazard_v2_mini.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[general]
compatibility_key = "A_A"
hazard_model_id = "DEMO_MODEL"

[logic_trees]
# model_version = "NSHM_v1.0.4"

# alternativly, specify a path to logic tree files
srm_file = "demo/srm_logic_tree_no_slab.json"
gmcm_file = "demo/gmcm_logic_tree_medium.json"

[site]
vs30s = [275]
# locations = ["WLG", "SRWG214", "-41.000~174.700", "myfile.csv"]
locations = ["-34.500~173.000"]
# locations = ["NZ"]
# locations = ["MRO"]

# alternativly specify a file with locations and vs30 values
# site_file = "sites.csv"

[calculation]
imts = ["PGA", "SA(0.5)", "SA(1.0)", "SA(2.0)", "SA(5.0)"]
#imts = ["PGA"]
agg_types = ["mean", "cov", "std", "0.005", "0.01", "0.025"]
Loading