Skip to content

Commit

Permalink
update docs; add model digrams and outline of proposed changes.
Browse files Browse the repository at this point in the history
chrisbc committed Jan 18, 2024
1 parent c307e51 commit d8bfb96
Showing 11 changed files with 357 additions and 25 deletions.
4 changes: 3 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
@@ -10,4 +10,6 @@ This page provides documentation for our command line tools.
::: mkdocs-click
:module: scripts.ths_cache
:command: cli
:prog_name: ths_cache
:prog_name: ths_cache

This module maybe deprecated
62 changes: 62 additions & 0 deletions docs/domain_model/disaggregation_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
**Tables:**

- **DisaggAggregationExceedance** - Disaggregation curves of Probablity of Exceedance
- **DisaggAggregationOccurence** - Disaggregation curves of Probablity of Occurence

The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.

The base class **DisaggAggregationBase** defines attribtues common to both types of disaggregation curve.

```mermaid
classDiagram
direction TB
class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True) # For this we will use a downsampled location to 1.0 degree
sort_key = UnicodeAttribute(range_key=True)
nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
nloc_01 = UnicodeAttribute() # 0.01deg ~1km grid
nloc_1 = UnicodeAttribute() # 0.1deg ~10km grid
nloc_0 = UnicodeAttribute() # 1.0deg ~100km grid
version = VersionAttribute()
uniq_id = UnicodeAttribute()
lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)
created = TimestampAttribute(default=datetime_now)
}
class DisaggAggregationBase{
... fields from LocationIndexedModel
hazard_model_id = UnicodeAttribute()
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
hazard_agg = EnumConstrainedUnicodeAttribute(AggregationEnum) # eg MEAN
disagg_agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
disaggs = CompressedPickleAttribute() # a very compressible numpy array,
bins = PickleAttribute() # a much smaller numpy array
shaking_level = FloatAttribute()
probability = EnumAttribute(ProbabilityEnum) # eg TEN_PCT_IN_50YRS
}
class DisaggAggregationExceedance{
... fields from DisaggAggregationBase
}
class DisaggAggregationOccurence{
... fields from DisaggAggregationBase
}
LocationIndexedModel <|-- DisaggAggregationBase
DisaggAggregationBase <| -- DisaggAggregationExceedance
DisaggAggregationBase <| -- DisaggAggregationOccurence
```
25 changes: 25 additions & 0 deletions docs/domain_model/gridded_hazard_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
**Tables:**

- **GriddedHazard** - Grid points defined in location_grid_id has a values in grid_poes.
- **HazardAggregation** - stores aggregate hazard curves [see ./openquake_models for details](./openquake_models.md)

```mermaid
classDiagram
direction LR
class GriddedHazard{
partition_key = UnicodeAttribute(hash_key=True)
sort_key = UnicodeAttribute(range_key=True)
version = VersionAttribute()
created = TimestampAttribute(default=datetime_now)
hazard_model_id = UnicodeAttribute()
location_grid_id = UnicodeAttribute()
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
poe = FloatAttribute()
grid_poes = CompressedListAttribute()
}
GriddedHazard --> "1..*" HazardAggregation
```
95 changes: 95 additions & 0 deletions docs/domain_model/openquake_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
## CURRENT STATE

These table models are used to store data created by GEMs **openquake** PSHA engine. Data is extracted from the HDF5 files created by openquake and stored with relevant metadata in the following tables.

## Seismic Hazard Model diagram

**Tables:**

- **ToshiOpenquakeMeta** - stores metadata from the job configuration and the openquake results.

```mermaid
classDiagram
direction LR
class ToshiOpenquakeMeta {
partition_key = UnicodeAttribute(hash_key=True) # a static value as we actually don't want to partition our data
hazsol_vs30_rk = UnicodeAttribute(range_key=True)
created = TimestampAttribute(default=datetime_now)
hazard_solution_id = UnicodeAttribute()
general_task_id = UnicodeAttribute()
vs30 = NumberAttribute() # vs30 value
imts = UnicodeSetAttribute() # list of IMTs
locations_id = UnicodeAttribute() # Location codes identifier (ENUM?)
source_ids = UnicodeSetAttribute()
source_tags = UnicodeSetAttribute()
inv_time = NumberAttribute() # Invesigation time in years
src_lt = JSONAttribute() # sources meta as DataFrame JSON
gsim_lt = JSONAttribute() # gmpe meta as DataFrame JSON
rlz_lt = JSONAttribute() # realization meta as DataFrame JSON
}
```

**Tables:**

- **OpenquakeRealization** - stores the individual hazard realisation curves.
- **HazardAggregation** - stores aggregate hazard curves from **OpenquakeRealization** curves.

The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.


```mermaid
classDiagram
direction TB
class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True) # For this we will use a downsampled location to 1.0 degree
sort_key = UnicodeAttribute(range_key=True)
nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
nloc_01 = UnicodeAttribute() # 0.01deg ~1km grid
nloc_1 = UnicodeAttribute() # 0.1deg ~10km grid
nloc_0 = UnicodeAttribute() # 1.0deg ~100km grid
version = VersionAttribute()
uniq_id = UnicodeAttribute()
lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)
created = TimestampAttribute(default=datetime_now)
}
class OpenquakeRealization {
... fields from LocationIndexedModel
hazard_solution_id = UnicodeAttribute()
source_tags = UnicodeSetAttribute()
source_ids = UnicodeSetAttribute()
rlz = IntegerAttribute() # index of the openquake realization
values = ListAttribute(of=IMTValuesAttribute)
}
class HazardAggregation {
... fields from LocationIndexedModel
hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
values = ListAttribute(of=LevelValuePairAttribute)
}
ToshiOpenquakeMeta --> "0..*" OpenquakeRealization
HazardAggregation --> "1..*" OpenquakeRealization
LocationIndexedModel <|-- OpenquakeRealization
LocationIndexedModel <|-- HazardAggregation
```
115 changes: 115 additions & 0 deletions docs/domain_model/proposed_hazard_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
## FUTURE STATE

These table models are used to store data created by any suitable PSHA engine.

## Seismic Hazard Model diagram

Different hazard engines, versions and/or configurations may produce compatible calcalution curves.

This model is similar to the current one, except that:

- the concept of compatible producer configs is supported
- **HazardRealizationCurve** records are identified solely by internal attributes & relationships. So **toshi_hazard_soluton_id** is removed but can be recorded in **HazardRealizationMeta**.

**TODO:** formalise logic tree branch identification for both source and GMM logic trees so that these are:

- a) unique and unambigious, and
- b) easily relatable to **nzshm_model** instances.

**Tables:**

- **CompatibleHazardConfig (CHC)** - defines a logical identifier for compatable **HCPCs**. Model managers must ensure that compability holds true.
- **HazardCurveProducerConfig (HCPC)** - stores the unique attributes that define compatible hazard curve producers.
- **HazardRealizationMeta** - stores metadata common to a set of hazard realization curves.
- **HazardRealizationCurve** - stores the individual hazard realisation curves.
- **HazardAggregation** - stores the aggregated hazard curves [see ./openquake_models for details](./openquake_models.md)

```mermaid
classDiagram
direction TB
class CompatibleHazardConfig {
primary_key
}
class HazardCurveProducerConfig {
primary_key
fk_compatible_config
producer_software = UnicodeAttribute()
producer_version_id = UnicodeAttribute()
configuration_hash = UnicodeAttribute()
configuration_data = UnicodeAttribute()
}
class HazardRealizationMeta {
partition_key = UnicodeAttribute(hash_key=True) # a static value as we actually don't want to partition our data
sort_key = UnicodeAttribute(range_key=True)
fk_compatible_config
fk_producer_config
created = TimestampAttribute(default=datetime_now)
?hazard_solution_id = UnicodeAttribute()
?general_task_id = UnicodeAttribute()
vs30 = NumberAttribute() # vs30 value
src_lt = JSONAttribute() # sources meta as DataFrame JSON
gsim_lt = JSONAttribute() # gmpe meta as DataFrame JSON
rlz_lt = JSONAttribute() # realization meta as DataFrame JSON
}
class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True)
sort_key = UnicodeAttribute(range_key=True)
nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
etc...
version = VersionAttribute()
uniq_id = UnicodeAttribute()
lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)
created = TimestampAttribute(default=datetime_now)
}
class HazardRealizationCurve {
... fields from LocationIndexedModel
fk_metadata
fk_compatible_config
?source_tags = UnicodeSetAttribute()
?source_ids = UnicodeSetAttribute()
rlz # TODO ID of the realization
values = ListAttribute(of=IMTValuesAttribute)
}
class HazardAggregation {
... fields from LocationIndexedModel
fk_compatible_config
hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
values = ListAttribute(of=LevelValuePairAttribute)
}
CompatibleHazardConfig --> "1..*" HazardCurveProducerConfig
HazardRealizationMeta --> "*..1" HazardCurveProducerConfig
HazardRealizationMeta --> "*..1" CompatibleHazardConfig
LocationIndexedModel <|-- HazardRealizationCurve
LocationIndexedModel <|-- HazardAggregation
HazardRealizationCurve --> "*..1" CompatibleHazardConfig
HazardRealizationCurve --> "*..1" HazardRealizationMeta
HazardAggregation --> "*..1" CompatibleHazardConfig
```
10 changes: 9 additions & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
@@ -5,11 +5,19 @@
To install toshi-hazard-store, run this command in your
terminal:

### using pip

``` console
$ pip install toshi-hazard-store
```

This is the preferred method to install toshi-hazard-store, as it will always install the most recent stable release.
### using poetry

``` console
$ poetry add toshi-hazard-store
```

These are the preferred method to install toshi-hazard-store, as they will always install the most recent stable release.

If you don't have [pip][] installed, this [Python installation guide][]
can guide you through the process.
11 changes: 6 additions & 5 deletions docs/sqlite_adapter_usage.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

Users may choose to store data locally instead of the default AWS DynamoDB store. Caveats:
Users may choose to store data locally instead of the default cloud AWS DynamoDB store. Caveats:

- The complete NSHM_v1.0.4 dataset will likely prove too large for this option.
- this is single-user only
@@ -9,8 +8,10 @@ Users may choose to store data locally instead of the default AWS DynamoDB store
## Environment configuration

```
SQLITE_ADAPTER_FOLDER = os.getenv('THS_SQLITE_FOLDER', './LOCALSTORAGE')
USE_SQLITE_ADAPTER = boolean_env('THS_USE_SQLITE_ADAPTER')
NZSHM22_HAZARD_STORE_STAGE={XXX} # e.g. LOCAL - this can be used to differentiate local datasets)
SQLITE_ADAPTER_FOLDER={YYY} # valid path to a local storage folder}
USE_SQLITE_ADAPTER=TRUE
```
## CLI for testing

@@ -53,7 +54,7 @@ sys 0m0.957s

**NB:** It is also possible to run a local instance of DyanmoDB using docker, and it should work as above if the environment is configured crrectly (TODO: write this up). This is not recommended except for testing.

#### Hazard Solution metadata (Sqlite adapter)
### Hazard Solution metadata (Sqlite adapter)

using the locally populated datastore ....

15 changes: 11 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
# Usage

The NZSHM toshi-hazard-store database is available for public, read-only access using AWS API credentials (contact via email: nshm@gns.cri.nz).

### Environment & Authorisation pre-requisites

```
``` console
NZSHM22_HAZARD_STORE_STAGE=XXXX (TEST or PROD)
NZSHM22_HAZARD_STORE_REGION=XXXXX (ap-southeast-2)
AWS_PROFILE- ... (See AWS authentication)
AWS_PROFILE- ... (See AWS authentication below)

```

#### AWS Authentication

- AWS credientials will be provided with so-called `short-term credentials` in the form of an `awx_access_key_id` and and `aws_access_key_secret`.

- Typically these are configured in your local credentials file as described in [Authenticate with short-term credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-authentication-short-term.html).

- An `AWS_PROFILE` environment variable determines the credentials used at run-time by THS.

## toshi-hazard-store (library)

To use toshi-hazard-store in a project
Loading

0 comments on commit d8bfb96

Please sign in to comment.