Skip to content

Standard Satellite v0

Tim Kirschke edited this page Oct 11, 2023 · 11 revisions

This node creates the standard satellite version 0. It contains the descriptive data, splitted by privacy, security and rate of change or the in your company defined rules.


User Configuration

In the "Config" section on the right hand-side, expand section "Data Vault" to modify the following configurations for the node:

Config Name Config Type Explanation
Hashkey Column columnDropdownSelector Select the parent hashkey of the satellite from the drop-down menu.
Hashdiff Column columnSelector Select the hashdiff of this satellite and bring it to the right side.
Disable High-Water-Mark? Boolean By default, a HWM is applied in the loading script. This reduces the selected source data to only rows, where the load date is newer than the maximum load date inside the existing Satellite. See below for more details!

System Columns

No Sys Columns are added inside a Satellite v0 Node.


Example 1

Description

In this example we got a standard satellite v0 with non privacy data. The source table is the 'STG_CUSTOMER_WIKI'.

Create the node:

Sat v0 Node

  • Select the 'STG_CUSTOMER_WIKI' and create the node 'SAT_V0_STG_CUSTOMER_WIKI' with the node type 'Datavault by Scalefree: Satellite v0'
  • Rename the satellite using your naming conventions. In this case rename it to 'CUSTOMER_WIKI_N0_S'

Sat v0 Columns

  • Open the node and delete all unnecessary columns, so that you only keep the Hashkey, Hashdiff, LoadDate Timestamp and Record Source aswell as the right descriptive data for that satellite

Sat v0 Column Selector

  • In the Config section on the right there is a tab called 'Data Vault'
  • Expand this and you get a column selector list 'Hashkey Column' and 'Hashdiff Column'
    • Select the hashkey of this satellite and bring it to the right side in the 'Hashkey Column' section
    • Select the hashdiff of this satellite and bring it to the right side in the 'Hashdiff Column' section

Sat v0 Graph

  • Create and run the node


Disabling High-Water Mark

The High-Water Mark can be disabled safely, but typically would decrease the performance again.

We recommend to try a bit what works best in your environment. You basically have three options:

  • Keep HWM activated, nothing needs to be done, the HWM is activated automatically.
  • Disable the HWM entirely: Set the toggle to true. This might reduce the performance. But maybe your stage layer already only includes the newest data. Then disabling might improve the performance.
  • Move the HWM to a previous layer: First, you apply the previous step to disable the HWM in the Satellite v0. Then you implement some kind of mechanism in previous Nodes to ensure that only records newer than what you already processed are available there. This could be especially effective when combining with different materializations of these previous layers.

Generated SQL Code of Example

      CREATE OR REPLACE TABLE
        "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S" (
          "HK_CUSTOMER_H" STRING,
          "HD_CUSTOMER_N_S" STRING,
          "LDTS" TIMESTAMP
          COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
          "RSRC" STRING
          COMMENT 'The Record Source (RSRC) describes the source of this data.',
          "C_ACCTBAL" NUMBER (12, 2) NOT NULL,
          "C_MKTSEGMENT" VARCHAR (10),
          "C_COMMENT" VARCHAR (117)
        )
      COMMENT = 'Customer data as defined by TPC-H'

      INSERT INTO
        "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S"
      WITH
        latest_entries_in_sat AS (
          /* get current rows from satellite */
          SELECT
            "HK_CUSTOMER_H",
            "HD_CUSTOMER_N_S"
          FROM
            "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S" QUALIFY ROW_NUMBER() OVER (
              PARTITION BY
                "HK_CUSTOMER_H"
              ORDER BY
                "LDTS" DESC
            ) = 1
        ),
        deduplicated_numbered_source AS (
          SELECT
            "STG_CUSTOMER_WIKI"."HK_CUSTOMER_H" AS HK_CUSTOMER_H,
            "STG_CUSTOMER_WIKI"."HD_CUSTOMER_N_S" AS HD_CUSTOMER_N_S,
            "STG_CUSTOMER_WIKI"."LDTS" AS LDTS,
            "STG_CUSTOMER_WIKI"."RSRC" AS RSRC,
            "STG_CUSTOMER_WIKI"."C_ACCTBAL" AS C_ACCTBAL,
            "STG_CUSTOMER_WIKI"."C_MKTSEGMENT" AS C_MKTSEGMENT,
            "STG_CUSTOMER_WIKI"."C_COMMENT" AS C_COMMENT,
            ROW_NUMBER() OVER (
              PARTITION BY
                "HK_CUSTOMER_H"
              ORDER BY
                "LDTS"
            ) as rn
          FROM
            "COALESCE_WORKSHOP"."CORE"."STG_CUSTOMER_WIKI" "STG_CUSTOMER_WIKI" QUALIFY CASE
              WHEN "HD_CUSTOMER_N_S" = LAG("HD_CUSTOMER_N_S") OVER (
                PARTITION BY
                  "HK_CUSTOMER_H"
                ORDER BY
                  "LDTS"
              ) THEN FALSE
              ELSE TRUE
            END
        )
      SELECT DISTINCT
        HK_CUSTOMER_H,
        HD_CUSTOMER_N_S,
        LDTS,
        RSRC,
        C_ACCTBAL,
        C_MKTSEGMENT,
        C_COMMENT
      FROM
        deduplicated_numbered_source
      WHERE
        NOT EXISTS (
          SELECT
            1
          FROM
            latest_entries_in_sat
          WHERE
            deduplicated_numbered_source.HK_CUSTOMER_H = latest_entries_in_sat."HK_CUSTOMER_H"
            AND deduplicated_numbered_source.HD_CUSTOMER_N_S = latest_entries_in_sat."HD_CUSTOMER_N_S"
            AND deduplicated_numbered_source.rn = 1
        )