Skip to content

Commit

Permalink
Changes to support the Oracle DB
Browse files Browse the repository at this point in the history
Add full support for the Oracle DB
  • Loading branch information
sfc-gh-dflippo authored Jun 26, 2022
2 parents 96c7068 + 001048d commit 3559fa9
Show file tree
Hide file tree
Showing 13 changed files with 216 additions and 36 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
target/
dbt_packages/
logs/
.DS_Store
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# dbt Constraints Package

This package generates database constraints based on the tests in a dbt project. It is currently compatible with Snowflake and PostgreSQL only.
This package generates database constraints based on the tests in a dbt project. It is currently compatible with Snowflake, PostgreSQL, and Oracle only.

## Why data engineers should add referential integrity constraints

The primary reason to add constraints to your database tables is that many tools including [DBeaver](https://dbeaver.io) and [Oracle SQL Developer Data Modeler](https://community.snowflake.com/s/article/How-To-Customizing-Oracle-SQL-Developer-Data-Modeler-SDDM-to-Support-Snowflake-Variant) can correctly reverse-engineer data model diagrams if there are primary keys, unique keys, and foreign keys on tables. Most BI tools will also add joins automatically between tables when you import tables that have foreign keys. This can both save time and avoid mistakes.

In addition, although Snowflake doesn't enforce most constraints, the [query optimizer can consider primary key, unique key, and foreign key constraints](https://docs.snowflake.com/en/sql-reference/constraints-properties.html?#extended-constraint-properties) during query rewrite if the constraint is set to RELY. Since dbt can test that the data in the table complies with the constraints, this package creates constraints on Snowflake with the RELY property to improve query performance.

Many other databases including PostgreSQL, SQL Server, Oracle, MySQL, and DB2 can use referential integrity constraints to perform "[join elimination](https://blog.jooq.org/join-elimination-an-essential-optimiser-feature-for-advanced-sql-usage/)" to remove tables from an execution plan. This commonly occurs when you query a subset of columns from a view and some of the tables in the view are unnecessary. Even on databases that do not support join elimination, some [BI and visualization tools will also rewrite their queries](https://docs.snowflake.com/en/user-guide/table-considerations.html#referential-integrity-constraints) based on constraint information, producing the same effect.
Many other databases including PostgreSQL, Oracle, SQL Server, MySQL, and DB2 can use referential integrity constraints to perform "[join elimination](https://blog.jooq.org/join-elimination-an-essential-optimiser-feature-for-advanced-sql-usage/)" to remove tables from an execution plan. This commonly occurs when you query a subset of columns from a view and some of the tables in the view are unnecessary. Even on databases that do not support join elimination, some [BI and visualization tools will also rewrite their queries](https://docs.snowflake.com/en/user-guide/table-considerations.html#referential-integrity-constraints) based on constraint information, producing the same effect.

Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL require indexes on their primary key columns in order to perform efficient joins between tables. Typically a primary key or unique key constraint is enforced on such databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but [that is something best added manually](https://docs.getdbt.com/reference/resource-configs/postgres-configs#indexes).
Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL and Oracle require indexes on their primary key columns in order to perform efficient joins between tables. Typically a primary key or unique key constraint is enforced on such databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but [that is something best added manually](https://docs.getdbt.com/reference/resource-configs/postgres-configs#indexes).

## Please note

Expand Down Expand Up @@ -41,7 +41,7 @@ vars:
```yml
packages:
- package: Snowflake-Labs/dbt_constraints
version: [">=0.3.0", "<0.4.0"]
version: [">=0.4.0", "<0.5.0"]
# <see https://github.com/Snowflake-Labs/dbt_constraints/releases/latest> for the latest version tag.
# You can also pull the latest changes from Github with the following:
# - git: "https://github.com/Snowflake-Labs/dbt_constraints.git"
Expand Down Expand Up @@ -99,7 +99,7 @@ packages:

* The package's macros depend on the results and graph object schemas of dbt >=1.0.0

* The package currently only includes macros for creating constraints in Snowflake and PostgreSQL. To add support for other databases, it is necessary to implement the following seven macros with the appropriate DDL & SQL for your database. Pull requests to contribute support for other databases are welcome. See the snowflake__create_constraints.sql and postgres__create_constraints.sql files as examples.
* The package currently only includes macros for creating constraints in Snowflake, PostgreSQL, and Oracle. To add support for other databases, it is necessary to implement the following seven macros with the appropriate DDL & SQL for your database. Pull requests to contribute support for other databases are welcome. See the <ADAPTER_NAME>__create_constraints.sql and postgres__create_constraints.sql files as examples.

```
<ADAPTER_NAME>__create_primary_key(table_model, column_names, verify_permissions, quote_columns=false)
Expand All @@ -117,7 +117,7 @@ Generally, if you don't meet a requirement, tests are still executed but the con
- All models involved in a constraint must be materialized as table, incremental, or snapshot.
- If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key.
- If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.
- All columns on constraints must be individual column names, not expressions. You can reference columns on a model that come from an expression.
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.0.4", "generated_at": "2022-04-22T21:39:49.991421Z", "invocation_id": "ce25464d-8bb3-48c4-aecd-e2013012be59", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.1.1", "generated_at": "2022-06-26T22:17:03.623067Z", "invocation_id": "d2d25f99-3104-4ad9-a377-e32acbd67692", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
20 changes: 10 additions & 10 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json", "dbt_version": "1.0.4", "generated_at": "2022-04-22T21:39:49.983008Z", "invocation_id": "ce25464d-8bb3-48c4-aecd-e2013012be59", "env": {}}, "results": [{"status": "success", "timing": [{"name": "compile", "started_at": "2022-04-22T21:39:49.936114Z", "completed_at": "2022-04-22T21:39:49.980945Z"}, {"name": "execute", "started_at": "2022-04-22T21:39:49.981331Z", "completed_at": "2022-04-22T21:39:49.981355Z"}], "thread_id": "Thread-1", "execution_time": 0.04632973670959473, "adapter_response": {}, "message": null, "failures": null, "unique_id": "operation.dbt_constraints.dbt_constraints-on-run-end-0"}], "elapsed_time": 0.1201019287109375, "args": {"write_json": true, "use_colors": true, "printer_width": 80, "version_check": true, "partial_parse": true, "static_parser": true, "profiles_dir": "/Users/dflippo/.dbt", "send_anonymous_usage_stats": true, "event_buffer_size": 100000, "compile": true, "which": "generate", "rpc_method": "docs.generate", "indirect_selection": "eager"}}
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json", "dbt_version": "1.1.1", "generated_at": "2022-06-26T22:17:03.612866Z", "invocation_id": "d2d25f99-3104-4ad9-a377-e32acbd67692", "env": {}}, "results": [{"status": "success", "timing": [{"name": "compile", "started_at": "2022-06-26T22:17:03.562665Z", "completed_at": "2022-06-26T22:17:03.610497Z"}, {"name": "execute", "started_at": "2022-06-26T22:17:03.610905Z", "completed_at": "2022-06-26T22:17:03.610929Z"}], "thread_id": "Thread-1", "execution_time": 0.04917192459106445, "adapter_response": {}, "message": null, "failures": null, "unique_id": "operation.dbt_constraints.dbt_constraints-on-run-end-0"}], "elapsed_time": 0.05549478530883789, "args": {"write_json": true, "use_colors": true, "printer_width": 80, "version_check": true, "partial_parse": true, "static_parser": true, "profiles_dir": "/Users/dflippo/.dbt", "send_anonymous_usage_stats": true, "event_buffer_size": 100000, "quiet": false, "no_print": false, "compile": true, "which": "generate", "rpc_method": "docs.generate", "indirect_selection": "eager"}}
7 changes: 0 additions & 7 deletions integration_tests/data/seeds.yml

This file was deleted.

3 changes: 3 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,6 @@ vars:

models:
+materialized: table

seeds:
+quote_columns: false
4 changes: 2 additions & 2 deletions integration_tests/models/fact_order_line.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@

SELECT
lineitem.*,
TO_CHAR(o_orderdate, 'YYYYMMDD')::INTEGER AS o_orderdate_key,
coalesce(l_orderkey::varchar, '') || '~' || coalesce(l_linenumber::varchar, '') AS integration_id
cast(TO_CHAR(o_orderdate, 'YYYYMMDD') AS INTEGER) AS o_orderdate_key,
coalesce(cast(l_orderkey as varchar(100)), '') || '~' || coalesce(cast(l_linenumber as varchar(100)), '') AS integration_id
FROM {{ source('tpc_h', 'lineitem') }} lineitem
JOIN {{ source('tpc_h', 'orders') }} orders ON l_orderkey = o_orderkey

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@

SELECT
lineitem.*,
coalesce(l_orderkey::varchar, '') || '~' || coalesce(l_linenumber::varchar, '') AS integration_id
coalesce(cast(l_orderkey as varchar(100)), '') || '~' || coalesce(cast(l_linenumber as varchar(100)), '') AS integration_id
FROM {{ source('tpc_h', 'lineitem') }} lineitem
8 changes: 4 additions & 4 deletions integration_tests/models/sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ sources:
- "ps_suppkey"
# How to validate a compound primary key natively
- unique:
column_name: "coalesce(ps_partkey::varchar, '') || '~' || coalesce(ps_suppkey::varchar, '')"
column_name: "coalesce(cast(ps_partkey as varchar(100)), '') || '~' || coalesce(cast(ps_suppkey as varchar(100)), '')"

- name: "supplier"
columns:
Expand Down Expand Up @@ -100,10 +100,10 @@ sources:
- l_linenumber
# How to validate a compound primary key natively
- unique:
column_name: "coalesce(l_orderkey::varchar, '') || '~' || coalesce(l_linenumber::varchar, '')"
column_name: "coalesce(cast(l_orderkey as varchar(100)), '') || '~' || coalesce(cast(l_linenumber as varchar(100)), '')"

# How to validate a compound foreign key
- relationships:
column_name: "coalesce(l_partkey::varchar, '') || '~' || coalesce(l_suppkey::varchar, '')"
column_name: "coalesce(cast(l_partkey as varchar(100)), '') || '~' || coalesce(cast(l_suppkey as varchar(100)), '')"
to: source('tpc_h', 'partsupp')
field: "coalesce(ps_partkey::varchar, '') || '~' || coalesce(ps_suppkey::varchar, '')"
field: "coalesce(cast(ps_partkey as varchar(100)), '') || '~' || coalesce(cast(ps_suppkey as varchar(100)), '')"
6 changes: 3 additions & 3 deletions macros/create_constraints.sql
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@
) }}
{%- endif -%}

{%- set table_relation = adapter.get_relation(
{%- set table_relation = api.Relation.create(
database=table_models[0].database,
schema=table_models[0].schema,
identifier=table_models[0].alias ) -%}
Expand Down Expand Up @@ -250,12 +250,12 @@

{%- if fk_model and pk_model -%}

{%- set fk_table_relation = adapter.get_relation(
{%- set fk_table_relation = api.Relation.create(
database=fk_model.database,
schema=fk_model.schema,
identifier=fk_model.alias) -%}

{%- set pk_table_relation = adapter.get_relation(
{%- set pk_table_relation = api.Relation.create(
database=pk_model.database,
schema=pk_model.schema,
identifier=pk_model.alias) -%}
Expand Down
Loading

0 comments on commit 3559fa9

Please sign in to comment.