-
Notifications
You must be signed in to change notification settings - Fork 3
Standard Satellite v0
Tim Kirschke edited this page Oct 11, 2023
·
11 revisions
This node creates the standard satellite version 0. It contains the descriptive data, splitted by privacy, security and rate of change or the in your company defined rules.
In the "Config" section on the right hand-side, expand section "Data Vault" to modify the following configurations for the node:
Config Name | Config Type | Explanation |
---|---|---|
Hashkey Column | columnDropdownSelector | Select the parent hashkey of the satellite from the drop-down menu. |
Hashdiff Column | columnSelector | Select the hashdiff of this satellite and bring it to the right side. |
Disable High-Water-Mark? | Boolean | By default, a HWM is applied in the loading script. This reduces the selected source data to only rows, where the load date is newer than the maximum load date inside the existing Satellite. See below for more details! |
No Sys Columns are added inside a Satellite v0 Node.
In this example we got a standard satellite v0 with non privacy data. The source table is the 'STG_CUSTOMER_WIKI'.
Create the node:
- Select the 'STG_CUSTOMER_WIKI' and create the node 'SAT_V0_STG_CUSTOMER_WIKI' with the node type 'Datavault by Scalefree: Satellite v0'
- Rename the satellite using your naming conventions. In this case rename it to 'CUSTOMER_WIKI_N0_S'
- Open the node and delete all unnecessary columns, so that you only keep the Hashkey, Hashdiff, LoadDate Timestamp and Record Source aswell as the right descriptive data for that satellite
- In the Config section on the right there is a tab called 'Data Vault'
- Expand this and you get a column selector list 'Hashkey Column' and 'Hashdiff Column'
- Select the hashkey of this satellite and bring it to the right side in the 'Hashkey Column' section
- Select the hashdiff of this satellite and bring it to the right side in the 'Hashdiff Column' section
- Create and run the node
The High-Water Mark can be disabled safely, but typically would decrease the performance again.
We recommend to try a bit what works best in your environment. You basically have three options:
- Keep HWM activated, nothing needs to be done, the HWM is activated automatically.
- Disable the HWM entirely: Set the toggle to true. This might reduce the performance. But maybe your stage layer already only includes the newest data. Then disabling might improve the performance.
- Move the HWM to a previous layer: First, you apply the previous step to disable the HWM in the Satellite v0. Then you implement some kind of mechanism in previous Nodes to ensure that only records newer than what you already processed are available there. This could be especially effective when combining with different materializations of these previous layers.
CREATE OR REPLACE TABLE
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S" (
"HK_CUSTOMER_H" STRING,
"HD_CUSTOMER_N_S" STRING,
"LDTS" TIMESTAMP
COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
"RSRC" STRING
COMMENT 'The Record Source (RSRC) describes the source of this data.',
"C_ACCTBAL" NUMBER (12, 2) NOT NULL,
"C_MKTSEGMENT" VARCHAR (10),
"C_COMMENT" VARCHAR (117)
)
COMMENT = 'Customer data as defined by TPC-H'
INSERT INTO
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S"
WITH
latest_entries_in_sat AS (
/* get current rows from satellite */
SELECT
"HK_CUSTOMER_H",
"HD_CUSTOMER_N_S"
FROM
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_N0_S" QUALIFY ROW_NUMBER() OVER (
PARTITION BY
"HK_CUSTOMER_H"
ORDER BY
"LDTS" DESC
) = 1
),
deduplicated_numbered_source AS (
SELECT
"STG_CUSTOMER_WIKI"."HK_CUSTOMER_H" AS HK_CUSTOMER_H,
"STG_CUSTOMER_WIKI"."HD_CUSTOMER_N_S" AS HD_CUSTOMER_N_S,
"STG_CUSTOMER_WIKI"."LDTS" AS LDTS,
"STG_CUSTOMER_WIKI"."RSRC" AS RSRC,
"STG_CUSTOMER_WIKI"."C_ACCTBAL" AS C_ACCTBAL,
"STG_CUSTOMER_WIKI"."C_MKTSEGMENT" AS C_MKTSEGMENT,
"STG_CUSTOMER_WIKI"."C_COMMENT" AS C_COMMENT,
ROW_NUMBER() OVER (
PARTITION BY
"HK_CUSTOMER_H"
ORDER BY
"LDTS"
) as rn
FROM
"COALESCE_WORKSHOP"."CORE"."STG_CUSTOMER_WIKI" "STG_CUSTOMER_WIKI" QUALIFY CASE
WHEN "HD_CUSTOMER_N_S" = LAG("HD_CUSTOMER_N_S") OVER (
PARTITION BY
"HK_CUSTOMER_H"
ORDER BY
"LDTS"
) THEN FALSE
ELSE TRUE
END
)
SELECT DISTINCT
HK_CUSTOMER_H,
HD_CUSTOMER_N_S,
LDTS,
RSRC,
C_ACCTBAL,
C_MKTSEGMENT,
C_COMMENT
FROM
deduplicated_numbered_source
WHERE
NOT EXISTS (
SELECT
1
FROM
latest_entries_in_sat
WHERE
deduplicated_numbered_source.HK_CUSTOMER_H = latest_entries_in_sat."HK_CUSTOMER_H"
AND deduplicated_numbered_source.HD_CUSTOMER_N_S = latest_entries_in_sat."HD_CUSTOMER_N_S"
AND deduplicated_numbered_source.rn = 1
)
Table of Content
- Staging
- DV-Entities
- Hubs
- Links
- Satellites
- Standard Satellite
- Multi-Active Satellite
- Non-Historized Satellite
- Record-Tracking Satellite
- Business Vault
- PIT
- Snapshot Control
- Global Variables