update docs; add model digrams and outline of proposed changes.

GNS-Science · Jan 18, 2024 · d8bfb96 · d8bfb96
1 parent c307e51
commit d8bfb96
Showing 11 changed files with 357 additions and 25 deletions.
diff --git a/docs/cli.md b/docs/cli.md
@@ -10,4 +10,6 @@ This page provides documentation for our command line tools.
 ::: mkdocs-click
     :module: scripts.ths_cache
     :command: cli
-    :prog_name: ths_cache
+    :prog_name: ths_cache
+
+    This module maybe deprecated
diff --git a/docs/domain_model/disaggregation_models.md b/docs/domain_model/disaggregation_models.md
@@ -0,0 +1,62 @@
+**Tables:**
+
+ - **DisaggAggregationExceedance** - Disaggregation curves of Probablity of Exceedance
+ - **DisaggAggregationOccurence** - Disaggregation curves of Probablity of Occurence
+
+The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.
+
+The base class **DisaggAggregationBase** defines attribtues common to both types of disaggregation curve.
+
+```mermaid
+classDiagram
+direction TB
+
+class LocationIndexedModel {
+
+    partition_key = UnicodeAttribute(hash_key=True)  # For this we will use a downsampled location to 1.0 degree
+    sort_key = UnicodeAttribute(range_key=True)
+
+    nloc_001 = UnicodeAttribute()  # 0.001deg ~100m grid
+    nloc_01 = UnicodeAttribute()  # 0.01deg ~1km grid
+    nloc_1 = UnicodeAttribute()  # 0.1deg ~10km grid
+    nloc_0 = UnicodeAttribute()  # 1.0deg ~100km grid
+
+    version = VersionAttribute()
+    uniq_id = UnicodeAttribute()
+
+    lat = FloatAttribute()  # latitude decimal degrees
+    lon = FloatAttribute()  # longitude decimal degrees
+    
+    vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
+    site_vs30 = FloatAttribute(null=True)
+
+    created = TimestampAttribute(default=datetime_now)
+
+}
+
+class DisaggAggregationBase{
+    ... fields from LocationIndexedModel
+    hazard_model_id = UnicodeAttribute()
+    imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
+
+    hazard_agg = EnumConstrainedUnicodeAttribute(AggregationEnum)  # eg MEAN
+    disagg_agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
+
+    disaggs = CompressedPickleAttribute()  # a very compressible numpy array,
+    bins = PickleAttribute()  # a much smaller numpy array
+
+    shaking_level = FloatAttribute()
+    probability = EnumAttribute(ProbabilityEnum)  # eg TEN_PCT_IN_50YRS
+}
+
+class DisaggAggregationExceedance{
+    ... fields from DisaggAggregationBase
+}
+
+class DisaggAggregationOccurence{
+    ... fields from DisaggAggregationBase
+}
+LocationIndexedModel <|-- DisaggAggregationBase
+DisaggAggregationBase <| -- DisaggAggregationExceedance
+DisaggAggregationBase <| -- DisaggAggregationOccurence
+```
diff --git a/docs/domain_model/gridded_hazard_models.md b/docs/domain_model/gridded_hazard_models.md
@@ -0,0 +1,25 @@
+**Tables:**
+
+ - **GriddedHazard** - Grid points defined in location_grid_id has a values in grid_poes.
+ - **HazardAggregation** - stores aggregate hazard curves [see ./openquake_models for details](./openquake_models.md)
+
+```mermaid
+classDiagram
+direction LR
+
+class GriddedHazard{
+    partition_key = UnicodeAttribute(hash_key=True)
+    sort_key = UnicodeAttribute(range_key=True)
+    version = VersionAttribute()
+    created = TimestampAttribute(default=datetime_now)
+    hazard_model_id = UnicodeAttribute()
+    location_grid_id = UnicodeAttribute()
+    vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
+    imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
+    agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
+    poe = FloatAttribute()
+    grid_poes = CompressedListAttribute()
+}
+
+GriddedHazard --> "1..*" HazardAggregation
+```
diff --git a/docs/domain_model/openquake_models.md b/docs/domain_model/openquake_models.md
@@ -0,0 +1,95 @@
+## CURRENT STATE
+
+These table models are used to store data created by GEMs **openquake** PSHA engine. Data is extracted from the HDF5 files created by openquake and stored with relevant metadata in the following tables.
+
+## Seismic Hazard Model diagram
+
+**Tables:**
+
+ - **ToshiOpenquakeMeta** - stores metadata from the job configuration and the openquake results.
+
+```mermaid
+classDiagram
+direction LR
+
+class ToshiOpenquakeMeta {
+    partition_key = UnicodeAttribute(hash_key=True)  # a static value as we actually don't want to partition our data
+    hazsol_vs30_rk = UnicodeAttribute(range_key=True)
+
+    created = TimestampAttribute(default=datetime_now)
+
+    hazard_solution_id = UnicodeAttribute()
+    general_task_id = UnicodeAttribute()
+    vs30 = NumberAttribute()  # vs30 value
+
+    imts = UnicodeSetAttribute()  # list of IMTs
+    locations_id = UnicodeAttribute()  # Location codes identifier (ENUM?)
+    source_ids = UnicodeSetAttribute()
+    source_tags = UnicodeSetAttribute()
+    inv_time = NumberAttribute()  # Invesigation time in years
+
+    src_lt = JSONAttribute()  # sources meta as DataFrame JSON
+    gsim_lt = JSONAttribute()  # gmpe meta as DataFrame JSON
+    rlz_lt = JSONAttribute()  # realization meta as DataFrame JSON
+}
+```
+
+**Tables:**
+
+ - **OpenquakeRealization** -  stores the individual hazard realisation curves.
+ - **HazardAggregation** - stores aggregate hazard curves from **OpenquakeRealization** curves.
+
+The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.
+
+
+```mermaid
+classDiagram
+direction TB
+
+class LocationIndexedModel {
+    partition_key = UnicodeAttribute(hash_key=True)  # For this we will use a downsampled location to 1.0 degree
+    sort_key = UnicodeAttribute(range_key=True)
+
+    nloc_001 = UnicodeAttribute()  # 0.001deg ~100m grid
+    nloc_01 = UnicodeAttribute()  # 0.01deg ~1km grid
+    nloc_1 = UnicodeAttribute()  # 0.1deg ~10km grid
+    nloc_0 = UnicodeAttribute()  # 1.0deg ~100km grid
+
+    version = VersionAttribute()
+    uniq_id = UnicodeAttribute()
+
+    lat = FloatAttribute()  # latitude decimal degrees
+    lon = FloatAttribute()  # longitude decimal degrees
+    
+    vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
+    site_vs30 = FloatAttribute(null=True)
+
+    created = TimestampAttribute(default=datetime_now)
+
+}
+
+class OpenquakeRealization {
+    ... fields from LocationIndexedModel
+    hazard_solution_id = UnicodeAttribute()
+    source_tags = UnicodeSetAttribute()
+    source_ids = UnicodeSetAttribute()
+
+    rlz = IntegerAttribute()  # index of the openquake realization
+    values = ListAttribute(of=IMTValuesAttribute)
+}
+
+class HazardAggregation {
+    ... fields from LocationIndexedModel
+    hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
+    imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
+    agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
+    values = ListAttribute(of=LevelValuePairAttribute)    
+}
+
+
+ToshiOpenquakeMeta --> "0..*"  OpenquakeRealization
+HazardAggregation --> "1..*" OpenquakeRealization
+LocationIndexedModel <|-- OpenquakeRealization
+LocationIndexedModel <|-- HazardAggregation
+
+```
diff --git a/docs/domain_model/proposed_hazard_models.md b/docs/domain_model/proposed_hazard_models.md
@@ -0,0 +1,115 @@
+## FUTURE STATE
+
+These table models are used to store data created by any suitable PSHA engine. 
+
+## Seismic Hazard Model diagram
+
+Different hazard engines, versions and/or configurations may produce compatible calcalution curves.
+
+This model is similar to the current one, except that:
+
+  - the concept of compatible producer configs is supported
+  - **HazardRealizationCurve** records are identified solely by internal attributes & relationships. So **toshi_hazard_soluton_id** is removed but can be recorded in **HazardRealizationMeta**.
+
+**TODO:** formalise logic tree branch identification for both source and GMM logic trees so that these are:
+
+ -  a) unique and unambigious, and
+ -  b) easily relatable  to **nzshm_model** instances.
+
+**Tables:**
+
+- **CompatibleHazardConfig (CHC)** - defines a logical identifier for compatable **HCPCs**. Model managers must ensure that compability holds true.
+- **HazardCurveProducerConfig (HCPC)** - stores the unique attributes that define compatible hazard curve producers. 
+- **HazardRealizationMeta** - stores metadata common to a set of hazard realization curves.
+- **HazardRealizationCurve** - stores the individual hazard realisation curves.
+ - **HazardAggregation** - stores the aggregated hazard curves [see ./openquake_models for details](./openquake_models.md)
+
+```mermaid
+classDiagram
+direction TB
+
+class CompatibleHazardConfig {
+    primary_key
+}
+
+class HazardCurveProducerConfig {
+    primary_key
+    fk_compatible_config
+
+    producer_software = UnicodeAttribute()
+    producer_version_id = UnicodeAttribute()
+    configuration_hash = UnicodeAttribute() 
+    configuration_data = UnicodeAttribute() 
+}
+
+class HazardRealizationMeta {
+    partition_key = UnicodeAttribute(hash_key=True)  # a static value as we actually don't want to partition our data
+    sort_key = UnicodeAttribute(range_key=True)
+
+    fk_compatible_config
+    fk_producer_config
+
+    created = TimestampAttribute(default=datetime_now)
+
+    ?hazard_solution_id = UnicodeAttribute()
+    ?general_task_id = UnicodeAttribute()
+    vs30 = NumberAttribute()  # vs30 value
+
+    src_lt = JSONAttribute()  # sources meta as DataFrame JSON
+    gsim_lt = JSONAttribute()  # gmpe meta as DataFrame JSON
+    rlz_lt = JSONAttribute()  # realization meta as DataFrame JSON
+}
+
+class LocationIndexedModel {
+    partition_key = UnicodeAttribute(hash_key=True)
+    sort_key = UnicodeAttribute(range_key=True)
+
+    nloc_001 = UnicodeAttribute()  # 0.001deg ~100m grid
+    etc...
+    version = VersionAttribute()
+    uniq_id = UnicodeAttribute()
+
+    lat = FloatAttribute()  # latitude decimal degrees
+    lon = FloatAttribute()  # longitude decimal degrees
+    
+    vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
+    site_vs30 = FloatAttribute(null=True)
+
+    created = TimestampAttribute(default=datetime_now)    
+}
+
+class HazardRealizationCurve {
+    ... fields from LocationIndexedModel
+    fk_metadata
+    fk_compatible_config
+
+    ?source_tags = UnicodeSetAttribute()
+    ?source_ids = UnicodeSetAttribute()
+
+    rlz # TODO ID of the realization
+    values = ListAttribute(of=IMTValuesAttribute)
+}
+
+class HazardAggregation {
+    ... fields from LocationIndexedModel
+
+    fk_compatible_config
+
+    hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
+    imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
+    agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
+    values = ListAttribute(of=LevelValuePairAttribute)    
+}
+
+CompatibleHazardConfig --> "1..*" HazardCurveProducerConfig
+HazardRealizationMeta --> "*..1" HazardCurveProducerConfig
+HazardRealizationMeta --> "*..1" CompatibleHazardConfig
+
+LocationIndexedModel <|-- HazardRealizationCurve
+LocationIndexedModel <|-- HazardAggregation
+
+HazardRealizationCurve --> "*..1" CompatibleHazardConfig
+HazardRealizationCurve --> "*..1" HazardRealizationMeta
+
+HazardAggregation --> "*..1" CompatibleHazardConfig
+```
diff --git a/docs/installation.md b/docs/installation.md
@@ -5,11 +5,19 @@
 To install toshi-hazard-store, run this command in your
 terminal:
 
+### using  pip
+
 ``` console
 $ pip install toshi-hazard-store
 ```
 
-This is the preferred method to install toshi-hazard-store, as it will always install the most recent stable release.
+### using  poetry
+
+``` console
+$ poetry add toshi-hazard-store
+```
+
+These are the preferred method to install toshi-hazard-store, as they will always install the most recent stable release.
 
 If you don't have [pip][] installed, this [Python installation guide][]
 can guide you through the process.

diff --git a/docs/sqlite_adapter_usage.md b/docs/sqlite_adapter_usage.md
@@ -1,5 +1,4 @@
-
-Users may choose to store data locally instead of the default AWS DynamoDB store. Caveats:
+Users may choose to store data locally instead of the default cloud AWS DynamoDB store. Caveats:
 
  - The complete NSHM_v1.0.4 dataset will likely prove too large for this option.
  - this is single-user only
@@ -9,8 +8,10 @@ Users may choose to store data locally instead of the default AWS DynamoDB store
 ## Environment configuration
 
 ```
-SQLITE_ADAPTER_FOLDER = os.getenv('THS_SQLITE_FOLDER', './LOCALSTORAGE')
-USE_SQLITE_ADAPTER = boolean_env('THS_USE_SQLITE_ADAPTER')
+NZSHM22_HAZARD_STORE_STAGE={XXX} # e.g. LOCAL - this can be used to differentiate local datasets)
+SQLITE_ADAPTER_FOLDER={YYY}      # valid path to a local storage folder}
+USE_SQLITE_ADAPTER=TRUE
+
 ```
 ## CLI for testing
 
@@ -53,7 +54,7 @@ sys     0m0.957s
 
 **NB:** It is also possible to run a local instance of DyanmoDB using docker, and it should work as above if the environment is configured crrectly (TODO: write this up). This is not recommended except for testing.
 
-#### Hazard Solution metadata (Sqlite adapter)
+### Hazard Solution metadata (Sqlite adapter)
 
 using the locally populated datastore ....
 

diff --git a/docs/usage.md b/docs/usage.md
@@ -1,15 +1,22 @@
-# Usage
-
+The NZSHM toshi-hazard-store database is available for public, read-only access using AWS API credentials (contact via email: nshm@gns.cri.nz).
 
 ### Environment & Authorisation pre-requisites
 
-```
+``` console
 NZSHM22_HAZARD_STORE_STAGE=XXXX (TEST or PROD)
 NZSHM22_HAZARD_STORE_REGION=XXXXX (ap-southeast-2)
-AWS_PROFILE- ... (See AWS authentication)
+AWS_PROFILE- ... (See AWS authentication below)
 
 ```
 
+#### AWS Authentication
+
+ - AWS credientials will be provided with so-called `short-term credentials` in the form of an `awx_access_key_id` and and `aws_access_key_secret`.
+
+ - Typically these are configured in your local credentials file as described in [Authenticate with short-term credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-authentication-short-term.html).
+
+ - An `AWS_PROFILE` environment variable determines the credentials used at run-time by THS.
+
 ## toshi-hazard-store (library)
 
 To use toshi-hazard-store in a project