Skip to content

Object management automation tool for Snowflake

License

Notifications You must be signed in to change notification settings

entera-ai/SnowDDL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SnowDDL

See Customizations section for descriptions of changes, enhancements, etc.

Overview

PyPI Getting Started Pytest

SnowDDL is a declarative-style tool for object management automation in Snowflake.

It is not intended to replace other tools entirely, but to provide an alternative approach focused on practical data engineering challenges.

You may find SnowDDL useful if:

  • complexity of object schema grows exponentially, and it becomes hard to manage;
  • your organization maintains multiple Snowflake accounts (dev, stage, prod);
  • your organization has multiple developers sharing the same Snowflake account and suffering from conflicts;
  • it is necessary to generate some part of configuration dynamically using Python;

Main features

  1. SnowDDL is "stateless".
  2. SnowDDL can revert any changes.
  3. SnowDDL supports ALTER COLUMN.
  4. SnowDDL provides built-in "Role hierarchy" model.
  5. SnowDDL re-creates invalid views automatically.
  6. SnowDDL simplifies code review.
  7. SnowDDL supports creation of isolated "environments" for individual developers and CI/CD scripts.
  8. SnowDDL strikes a good balance between dependency management overhead and parallelism.
  9. SnowDDL configuration can be generated dynamically in Python code.
  10. SnowDDL can manage packages for Java and Python UDF scripts natively.

Quick links

Introduction videos

Customizations

Optional Schema Role Generation

PR: #1

By default, SnowDDL autogenerates 3 schema roles for every schema defined in the config:

  • <db>__<schema>__OWNER__S_ROLE
  • <db>__<schema>__READ__S_ROLE
  • <db>__<schema>__WRITE__S_ROLE

In some cases, all of these roles may not always be needed or desired. Schema roles can be generated selectively, or skipped entirely, using the schema_roles attribute in schema config files or SchemaBlueprints

Examples:

schema_roles:
  - owner
  - read
  # don't generate schema write role
schema_roles: False  # don't generate any schema roles

Optional Schema Role Grant Filtering

PR: #4

Autogenerated schema roles are granted various pre-defined low level privileges by default. For example, schema OWNER roles are granted CREATE TABLE on SCHEMA, OWNERSHIP on FUTURE TABLES, etc. These grants are generated on the fly by the SchemaRoleResolver. See https://github.com/littleK0i/SnowDDL/blob/b0e28e0450758690d096e6eec35fbaa8a71399f1/snowddl/resolver/schema_role.py#L12 for source code for default grants

The full set of default grants are often not needed or desired for a given schema role. Therefore, users may want to limit these grants to a more narrow subset. The benefits of this are cleaner and less noisy snowddl plan outputs, and better adherance to principle of least privilege. This can be accomplished with further configuration of the schema_roles attributes of schema configs / blueprints. This feature can also be used to exclude generation of some schema roles entirely, similarly to the enhancement described above.

For example, suppose we want to generate only owner schema role with grants CREATE TABLE and OWNERSHIP ON FUTURE TABLES, and read schema role with grants SELECT ON FUTURE TABLES:

schema_roles:
  owner:
    create:
      - TABLE
  read:
    privileges:
      TABLES:
        - SELECT

schema_roles Schema Config Inheritance From DB Config

PR: #5

schema_roles config for individual schemas can be set at the database level, and inherited for all schemas contained in that db. This is a useful way to keep configuration DRY, and avoid reconfiguring this attribute for many schemas. Be mindful of applying too many unecessary grants to schemas using this option. Similar results can also be achieved using programmatic Python config, potentially with more granular control over special case schemas.

Schema Role Grant Inheritance in DB Clones

PR: #2

Cloning operations in Snowflake typically don't work well for objects managed by IaC tools like SnowDDL, because when grants get copied from source to target, it creates resource drift between the IaC config and the Snowflake state

This can be handled via the copy_schema_role_grants_to_db_clones attribute on database configs / blueprints. This attribute takes a list of of database clone names. The SchemaRoleResolver will then "duplex" all schema role grants to these db namespaces.

For example, suppose DB1__SCHEMA1__OWNER__S_ROLE gets USAGE ON DATABASE DB1, USAGE ON SCHEMA DB1.SCHEMA1, and CREATE TABLE ON SCHEMA DB1.SCHEMA1. Suppose the db config for DB1 looks like the following:

copy_schema_role_grants_to_db_clones:
  - DB1_CLONE

All grants that DB1__SCHEMA1__OWNER__S_ROLE has on DB1 will get duplexed to DB1_CLONE:

USAGE ON DATABASE DB1_CLONE
USAGE ON SCHEMA DB1_CLONE.SCHEMA1
CREATE TABLE ON SCHEMA DB1_CLONE.SCHEMA1

Now when clone operations are executed in Snowflake and schema level grants get copied to clone, no resource drift will occur on schema level grants (note: resource drift does still occur on db level grants; see linked PR for more info / recommendations). Similarly, when swap operations are executed, the source and target will already share the same grants, so no resource drift will occur

NOTE: this feature is intended to be used in conjunction with Optional Schema Role Generation for the db clones. That is, db clone schemas should be configured with schema_roles: False, so no fresh schema roles get generated on the cloned schemas. These roles are not necessary because grants are already inherited from source db schema roles

Local Development

  1. Clone repo
git clone [email protected]:entera-ai/SnowDDL.git
  1. Create & Activate Virtual Env
python -m venv snowddl-local-venv
source snowddl-local-venv/bin/activate
  1. Install Editable Project from Local Clone Directory
pip install --editable /path/to/cloned/fork/repo/directory

Any changes made to the source code will automatically reflect in snowddl invocations invoked from activated virtual env, because we used an --editable install. This is the best way to iteratively develop and troubleshoot changes to the source code.

Mini-roadmap

  • placeholders in YAML configs (done)
  • documentation for dynamic config generation in Python ("advanced mode") (done)
  • video tutorials (done, but more tutorials are coming in future)
  • full test coverage for all object types and transformations

Issues? Questions? Feedback?

Please use GitHub "Issues" to report bugs and technical problems.

Please use GitHub "Discussions" to ask questions and provide feedback.

Created by

Vitaly Markov, 2024

Enjoy!

About

Object management automation tool for Snowflake

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Other 0.2%