Skip to content

Scalefree's dbt package for a Data Vault 2.0. Modified to work with Oracle and ETL tools such as Oracle Data Integrator

License

Notifications You must be signed in to change notification settings

universe-designer/datavault4dbt

 
 

Repository files navigation


What is special about this Fork?

This fork was created for an Oracle specific environment, where Turbovault4dbt and Datavault4dbt were used as code generators for tables and views. Load operations were carried out in a third party ETL tool, where simple 1:1 mappings between Views and tables were created. Please see this blog post for details. Besides the Oracle compatibility it offers advanced features for Oracle such as

  • Link Effectivity Sattellites
  • Load Views for 1:1-Mappings in ETL-Tools such as Oracle Data Integrator
  • Read Views for Version 1 Satellites
  • Primary Keys
  • Foreign Keys
  • Table Comments
  • Column Comments
  • Index Tablespace Modification
  • Partitioning
  • Helpful SQL generators to create, select, drop, delete, grant models

This fork can be used together with this turbovault4dbt fork. If you use Windows, use dv-starter to install and preconfigure Turbovault4dbt and Datavault4dbt easily.

Included Macros

  • Staging Area (For Hashing, prejoins and ghost records)
  • Hubs, Links & Satellites (allowing multiple deltas)
  • Non-Historized Links and Satellites
  • Multi-Active Satellites
  • Virtualized End-Dating (in Satellites)
  • Reference Hubs, - Satellites, and - Tables
  • PIT Tables
    • Hook for Cleaning up PITs
  • Snapshot Control

Features

With datavault4dbt you will get a lot of awesome features, including:

  • A Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt
  • Ready for both Persistent Staging Areas and Transient Staging Areas, due to the allowance of multiple deltas in all macros, without loosing any intermediate changes- Enforcing standards in naming conventions by implementing global variables for technical columns
  • A fully auditable solution for a Data Warehouse
  • Creating a centralized, snapshot-based Business interface by using a centralized snapshot table supporting logarithmic logic
  • A modern insert-only approach that avoids updating data
  • Optimizing incremental loads by implementing a high-water-mark that also works for entities that are loaded from multiple sources
  • A straight-forward, standardized approach to conduct agile datawarehouse development cycles

Requirements

To use the macros efficiently, there are a few prerequisites you need to provide:

  • Flat & Wide source data, available within your target database
  • Load Date column that represents the arriving time in the source data storage
  • Record Source column that gives information about where the source data is coming from (e.g. the file location inside a Data Lake)

Resources:

Supported platforms:

Currently supported platforms are:

  • Google Bigquery
  • Exasol
  • Snowflake
  • PostgreSQL
  • Amazon Redshift
  • Oracle with advanced features
    • Link Effectivity Sattellites
    • Load Views for 1:1-Mappings in ETL-Tools such as Oracle Data Integrator
    • Read Views for Version 1 Satellites
    • Primary Keys
    • Foreign Keys
    • Table Comments
    • Column Comments
    • Index Tablespace Modification
    • Partitioning
    • Helpful SQL generators to create, select, drop, delete, grant models

We are working continuously at high pressure to adapt the package for large variety of different platforms. In the future, the package will hopefully be available for SQL Server and many more.


Installation instructions

  1. Include this package in your packages.yml -- check here for installation instructions.
  2. Run dbt deps

For further information on how to install packages in dbt, please visit the following link: https://docs.getdbt.com/docs/building-a-dbt-project/package-management

Global variables

datavault4dbt is highly customizable by using many global variables. Since they are applied on multiple levels, a high rate of standardization across your data vault 2.0 solution is guaranteed. The default values of those variables are set inside the packages dbt_project.yml and should be copied to your own dbt_project.yml. For an explanation of all global variables see the wiki.


Usage

The datavault4dbt package provides macros for Staging and Creation of all DataVault-Entities you need, to build your own DataVault2.0 solution. The usage of the macros is well-explained in the documentation: https://github.com/ScalefreeCOM/datavault4dbt/wiki


Contributing

View our contribution guidelines


License

Apache 2.0

Contact

For questions, feedback, etc. reach out to us via [email protected]!

About

Scalefree's dbt package for a Data Vault 2.0. Modified to work with Oracle and ETL tools such as Oracle Data Integrator

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PLSQL 100.0%