This is a Fork of https://github.com/littleK0i/SnowDDL
See Customizations
section for descriptions of changes, enhancements, etc.
SnowDDL is a declarative-style tool for object management automation in Snowflake.
It is not intended to replace other tools entirely, but to provide an alternative approach focused on practical data engineering challenges.
You may find SnowDDL useful if:
- complexity of object schema grows exponentially, and it becomes hard to manage;
- your organization maintains multiple Snowflake accounts (dev, stage, prod);
- your organization has multiple developers sharing the same Snowflake account and suffering from conflicts;
- it is necessary to generate some part of configuration dynamically using Python;
- SnowDDL is "stateless".
- SnowDDL can revert any changes.
- SnowDDL supports ALTER COLUMN.
- SnowDDL provides built-in "Role hierarchy" model.
- SnowDDL re-creates invalid views automatically.
- SnowDDL simplifies code review.
- SnowDDL supports creation of isolated "environments" for individual developers and CI/CD scripts.
- SnowDDL strikes a good balance between dependency management overhead and parallelism.
- SnowDDL configuration can be generated dynamically in Python code.
- SnowDDL can manage packages for Java and Python UDF scripts natively.
PR: #1
By default, SnowDDL autogenerates 3 schema roles for every schema defined in the config:
<db>__<schema>__OWNER__S_ROLE
<db>__<schema>__READ__S_ROLE
<db>__<schema>__WRITE__S_ROLE
In some cases, all of these roles may not always be needed or desired. Schema roles can be generated
selectively, or skipped entirely, using the schema_roles
attribute in schema config files or SchemaBlueprint
s
Examples:
schema_roles:
- owner
- read
# don't generate schema write role
schema_roles: False # don't generate any schema roles
PR: #4
Autogenerated schema roles are granted various pre-defined low level privileges by default. For example,
schema OWNER
roles are granted CREATE TABLE on SCHEMA
, OWNERSHIP on FUTURE TABLES
, etc. These grants
are generated on the fly by the SchemaRoleResolver
. See
https://github.com/littleK0i/SnowDDL/blob/b0e28e0450758690d096e6eec35fbaa8a71399f1/snowddl/resolver/schema_role.py#L12
for source code for default grants
The full set of default grants are often not needed or desired for a given schema role. Therefore, users
may want to limit these grants to a more narrow subset. The benefits of this are cleaner and less noisy
snowddl plan
outputs, and better adherance to principle of least privilege. This can be accomplished with
further configuration of the schema_roles
attributes of schema configs / blueprints. This feature can also
be used to exclude generation of some schema roles entirely, similarly to the enhancement described above.
For example, suppose we want to generate only owner
schema role with grants CREATE TABLE
and OWNERSHIP ON FUTURE TABLES
,
and read
schema role with grants SELECT ON FUTURE TABLES
:
schema_roles:
owner:
create:
- TABLE
read:
privileges:
TABLES:
- SELECT
PR: #5
schema_roles
config for individual schemas can be set at the database level, and inherited for all
schemas contained in that db. This is a useful way to keep configuration DRY, and avoid reconfiguring
this attribute for many schemas. Be mindful of applying too many unecessary grants to schemas using this
option. Similar results can also be achieved using programmatic Python config, potentially with more
granular control over special case schemas.
PR: #2
Cloning operations in Snowflake typically don't work well for objects managed by IaC tools like SnowDDL, because when grants get copied from source to target, it creates resource drift between the IaC config and the Snowflake state
This can be handled via the copy_schema_role_grants_to_db_clones
attribute on database configs / blueprints.
This attribute takes a list of of database clone names. The SchemaRoleResolver
will then "duplex" all schema
role grants to these db namespaces.
For example, suppose DB1__SCHEMA1__OWNER__S_ROLE
gets USAGE ON DATABASE DB1
, USAGE ON SCHEMA DB1.SCHEMA1
,
and CREATE TABLE ON SCHEMA DB1.SCHEMA1
. Suppose the db config for DB1
looks like the following:
copy_schema_role_grants_to_db_clones:
- DB1_CLONE
All grants that DB1__SCHEMA1__OWNER__S_ROLE
has on DB1
will get duplexed to DB1_CLONE
:
USAGE ON DATABASE DB1_CLONE
USAGE ON SCHEMA DB1_CLONE.SCHEMA1
CREATE TABLE ON SCHEMA DB1_CLONE.SCHEMA1
Now when clone
operations are executed in Snowflake and schema level grants get copied to clone, no
resource drift will occur on schema level grants (note: resource drift does still occur on db level
grants; see linked PR for more info / recommendations). Similarly, when swap
operations are executed,
the source and target will already share the same grants, so no resource drift will occur
NOTE: this feature is intended to be used in conjunction with Optional Schema Role Generation
for the
db clones. That is, db clone schemas should be configured with schema_roles: False
, so no fresh schema
roles get generated on the cloned schemas. These roles are not necessary because grants are already inherited
from source db schema roles
- Clone repo
git clone [email protected]:entera-ai/SnowDDL.git
- Create & Activate Virtual Env
python -m venv snowddl-local-venv
source snowddl-local-venv/bin/activate
- Install Editable Project from Local Clone Directory
pip install --editable /path/to/cloned/fork/repo/directory
Any changes made to the source code will automatically reflect in snowddl
invocations invoked from
activated virtual env, because we used an --editable
install. This is the best way to iteratively
develop and troubleshoot changes to the source code.
placeholders in YAML configs(done)documentation for dynamic config generation in Python ("advanced mode")(done)video tutorials(done, but more tutorials are coming in future)- full test coverage for all object types and transformations
Please use GitHub "Issues" to report bugs and technical problems.
Please use GitHub "Discussions" to ask questions and provide feedback.
Vitaly Markov, 2024
Enjoy!