- n/a
- n/a
- n/a
- n/a
- Infinite recursion loop when
name_prefix
is used on some resources.
- Parsing of SQL files from YAML files
- Support for Terraform
moved
option (resource rename)
- Model variables not carried overs to all environments
- Reduced import time
- Optimized tests to load spark only when required
- Optimized performance during stack export in IaC backend native format
MemoryDataSource
support for reading dict or list of data [#337]- Support for
!use
tag in YAML files to directly inject content of other file - Support for
!update
tag in YAML files to use another file to update the content of a dictionary - Support for
!extend
tag in YAML files to use another file to extend (append) more items to a list
- Laktory variables to support python expressions [#334]
- Laktory variables to support complex types such as dictionaries and lists
- All Laktory model fields to allow
str
type for receiving a variable or expression - Reference to external YAML file path can use variables injection[#335]
- Removed support for ${include.*} variables in YAML files.
- Reference to external YAML files is now relative to the calling file instead of being relative to the stack entry point.
- Support for Python 3.12
- Support for Python 3.13
- Contribution guidelines
- Support for external PR
uv
as the recommended package manager- Formatting and linting with ruff (instead of black)
- Added ruff formatting and linting as a pre-commit
- Added pytest fixtures to run tests only when required environment variables are available
- Injection of variables into pipeline requirements
- Serverless pipeline job raises a validation error
- Support for view creation from pipeline node
--version
andversion
CLI commands- Support for
for_each_task
in Databricks job resource - Support for external libraries in pipelines
- Variables supports referencing environment variables and other variables
- Replaced
dataframe_backend
propagation with dynamic parent lookup - Introduced
PipelineChild
internal class to manage child/parent relationship - Added laktory package as a task cluster dependency when Databricks Job is used as a pipeline orchestrator
- Renamed
dataframe_type
todataframe_backend
- Renamed pipeline orchestrator from
"DLT"
to"DATABRICKS_DLT"
- Renamed pipeline databricks job and dlt resource names. May cause a re-deployment.
- Moved pipeline
notebook_path
underdatabricks_job
attribute.
- CDC Merge when records flagged for delete don't exist in target
is_enabled
option to resources for disabling specific resources for specific environments or configurations.name_prefix
andname_suffix
options for DLT pipeline- Support for "AVRO", "ORC", "TEXT" and "XML" format for file data source with spark dataframe backend.
inject_vars_into_dump
method forBaseModel
class to inject variables into a dictionaryMlfflowExperiment
Databricks resourceMlfflowModel
Databricks resourceMlfflowWebhook
Databricks resourceAlert
Databricks resourceQuery
Databricks resourcename_prefix
andname_suffix
options forAlert
,Dashboard
andQuery
resources.
- Removed dependency on
pytz
- Refactored
BaseModel
inject_vars
method to inject variables directly into the model, instead of into a dump. - Deprecated
SQLQuery
Databricks resource
- Databricks job name prefix and suffix attributes
- Propagation of stack variables to all resources
- Removed unsafe characters from pipeline default root
- Support for setting Laktory Databricks Workspace root from the stack file
- Support for Databricks Job Queuing [#307]
- Injection of variables into pipeline names
- Given priority to stack variables over environment variables
- Automatic assignation of pipeline name to Databricks Job name when selected as orchestrator
workflows
quickstart pipeline notebook to support custom laktory root
- Renamed
PipelineNode
attributeprimiary_key
toprimary_keys
to support multiple keys
- DataSink merge for out-of-order records with streaming DataFrame
DataSinks
merge
write mode for Change Data Capture, supporting type 1 and type 2 SCD
- Source format in stock prices quickstart pipeline
- Removed warning due to usage of
FileDataSource
private attributeschema
- Support for JSONL and NDJSON formats in
FileDataSource
- Missing stream query termination in
TableDataSink
model - Raise Exception when resource names are not unique [#294]
- Logs to include timestamp
FileDataSource
to support sparkread_options
FileDataSource
to supportschema
specification for weakly-typed formats
- Support for
ClusterPolicy
Databricks resource - Support for
Repo
Databricks resource
- ReadMe file code
- DataFrameColumnExpression model
- Data Quality Expectations
- Data Quality Checks
- Support for multiple sinks per pipeline node
- Support for quarantine sink
- Root path for laktory, pipelines and pipeline nodes
- Stream writer
FileDataSink
- Support for
null
value inJobTaskSQLTask
queries - Singularized attribute names in
JobEmailNotifications
for Terraform [#276] - Added missing
source
attribute inJobTaskSqlTaskFile
[#275]
Job
to automatically alphabetically sorttasks
[#286]Job
now supportsdescription
[#277]JobTaskNotebookTask
now allowswarehouse_id
for compute [#265]JobTaskSQLTask
updated to supportnull
for queries [#274]
- Renamed
sql_expr
toexpr
to enable both SQL and DataFrame expressions with auto-detection - Updated DLT Expectation action "ALLOW" to "WARN"
- Prefixed
dlt_
towarning_expectations
properties in pipeline nodes - Refactored default paths for
WorkspaceFile
andDBFSFile
models for improved target location control [#263] - Refactored Polars reader to read as LazyFrame
- Renamed
PipelineNode
attributesink
tosinks
- Workflows quickstart to include debug script
- Workflows quickstart to better align DLT and job pipeline.
- Grants resources to Stack
no_wait
option for Cluster resources- Polars quickstart
- Missing dependencies when deploying grants and data access with Metastore
- Added SQL expression to logs when processing Polars DataFrame
- Renamed workspace provider to grants provider in Metastore resource
- Support for multi-segments (semi-column ; separated) SQL statements
- Support for Databricks Lakeview Dashboard resource
- CLI
destroy
command unity-catalog
,workspace
andworkflows
template choices for CLIquickstart
- Better feedback when terraform is not installed
- Added SQL query to pipeline node transformer logs
- Removed
backend
andorganization
arguments for CLI - Combined CLI
pulumi-options
andterraform-options
intooptions
VectorSearchIndex
Databricks resourceVectorSearchEndpoint
Databricks resourcepurge
method for data sinkfull_refresh
option for pipeline and pipeline node
- Checkpoint location of
TableDataSink
- mergeSchema and overwriteSchema default options in DataSink writers
- Support for models yaml merge
- MwsNccBinding databricks resource
- MwsNetworkConnectivityConfig databricks resource
- Support for Databricks Table
storage_credential_name
andstorage_location
properties - Support for
BINARYFILE
(PDF) format inFileDataSource
with Spark
- DLT Debug mode when source is streaming and node is not
- DataFrame type propagation when models are used as inputs to other models
- Terraform auto-approve when other options are used
show_version_info()
method to display correct packages version
- Support for referencing nodes in SQL queries
- Support for looking up existing resources
- Support for terraform alias providers
laktory
namespace to spark.sql.connect
- Parametrized SQL expressions used in the context of DLT
- Support for Polars 1.0
- Support for parametrized queries when DLT module is loaded
- Support for parametrized queries when DLT module is loaded
- Issue with getting environment stack on null properties
with_column
transformer node method to allow forNone
type
- WorkspaceFile attribute to Pipeline class to customize access controls
- Spark dependencies
- Fixed encoding when reading from yaml files
- Changed pipeline JSON file permission from
account users
tousers
- Smart join to support coalesce of columns outside of the join
- Dataframe type propagation through all pipeline children
- Reading pipeline node data source is isolation mode
- Creation of the same column multiple times in a transformer node
- Accessing custom DataFrame functions in custom namespace in SparkChainNode execution
- Support for Polars with FileDataSource
- Support for Polars with FileDataSink
- Support for PolarsChain transformer
- Polars DataFrame extension
- Polars Expressions extension
- Refactored column creation inside a transformer node
- Moved laktory Spark dataframe custom functions under a laktory namespace.
- Support for SQL expression in SparkChain node
- Limit option to Data Sources
- Sample option to Data Sources
- Display method for Spark DataFrames
- Stack model environments to support overwrite of individual list element
- Pipeline Node data source read with DLT in debug mode
- Install instructions
- Re-organized optional dependencies
- Remove support for Pulumi python
- Updated ReadMe
- Stack Validator unit test
Pipeline
model, the new central component for building ETL pipelinesPipelineNode
model, thePipeline
sub-component defining each dataframe in a pipelineFileDataSink
andTableDataSink
sinks modelsPipelineNodeDataSource
andMemoryDataSource
sources model- Future support for Polars and other types of dataframe
- Enabled CDC support for both
FileDataSource
andTableDataSource
- Merged
DataEventHeader
intoDataEvent
model - Renamed
EventDataSource
model toFileDataSource
- Renamed
name
attribute totable_name
inTableDataSource
model - Removed
SparkChain
support in DataSources - Renamed
Pipeline
model toDLTPipeline
model - Cloud resources moved under models.resources.{provider}.{resource_class} to avoid collisions with future classes.
- Removed
TableBuilder
.PipelineNode
should be used instead
- Support for spark chain for a data source
- Support for broadcasting in a data source
- YAML model dump for all base models
- Function selection for pyspark connect dataframes
SparkChain
a high level class allowing to declare and execute spark operations on a dataframeSparkColumnNode
the node of aSparkChain
that builds a new columnSparkTableNode
the node of aSparkChain
that returns a new dataframe- Moved
filter
,selects
andwatermarks
properties tomodels.BaseDataSource
so that it can be used for all source types models.BaseDataSource
renames
attribute for renaming columns of the source tablemodels.BaseDataSource
drops
attribute for dropping columns of the source table- spark DataFrame
watermark
returns the watermark column and threshold if any - spark DataFrame
smart_join
joins, cleans duplicated columns and supports watermarking - spark DataFrame
groupby_and_agg
groupby and aggregates in a single function - spark DataFrame
window_filter
takes the first n rows over a window
- n/a
- Refactored table builder to use SparkChain instead of direct definitions of joins, unions, columns building, etc.
- CLI
run
command to execute remote jobs and pipelines and monitor errors until completion Dispatcher
class to manage and run remote jobsJobRunner
class to run remote jobsPipelineRunner
class to run remote pipelines- datetime utilities
- Environment variables
DATABRICKS_SDK_UPSTREAM
andDATABRICKS_SDK_UPSTREAM_VERSION
to track laktory metrics as a Databricks partner
- Permissions resource dependencies on
DbfsFile
andWorkspaceFile
.
- Support for table unions in table builder
- New column property
raise_missing_arg_exception
to allow for some spark function inputs to be missing add
,sub
,mul
anddiv
spark functions
- Renamed
power
spark function toscaled_power
to prevent conflict with native spark function
quickstart
CLI command to initialize a sample Laktory stack.- Databricks DBFS file model
- show_version_info() method for bugs reporting
- Git issues templates
- Website branding
- Support for DLT views
- Support for providing table builder
drop_duplicates
with a list of columns to consider for the drop duplicates.
- Propagation of stack variables to resources
- Support for custom join sql expression in
TableJoin
model
- Support for explicit path in
TableDataSource
model
Metastore
,MetastoreAssignment
,MetastoreDataAccess
,MwsPermissionAssignment
andExternalLocation
modelsworkspace_permission_assginments
field toGroup
modelStackValidator
class for testing deployment in both Pulumi and Terraform
- Pipeline model supports null catalog (hive metastore)
- Event Data Source supports custom event root path
- Event Data Source supports custom schema location path
- Refactored
core_resources
property to automatically propagate provider and dependencies to sub-resources.
- Refactored default resource name to remove illegal characters, resolve variables and remove resource tags.
- General support for Terraform IaC backend
- AWS Provider
- Azure Provider
- Azure Pulumi (Native) Provider
BaseModel
inject_vars
method regular expressions support
- Replaced CLI argument
--stack
with--org
and--dev
for a more consistent experience between pulumi and terraform backends
- Automatic creation of resources output variables that can be used in configuration files
- Custom model serialization allowing conversion of keys to camel case
- Laktory CLI
- Stack model to define and deploy a complete collection of resources from yaml files only and manage environments
- Support for cross-references in yaml files. A yaml configuration file can include another.
BaseResource
andPulumiResource
models with all methods required to deploy through pulumiGrants
modelGroupMember
modelPermissions
modelServicePrincipalRole
modelUserRole
modelresources
object to aBaseResource
instance to define and deploy all the associated resources
events_root
field ofDataEventHeader
andDataEvent
models is now a property for the default value to dynamically account for settingsinject_vars
method to support multiple targets (pulumi_yaml
,pulumi_py
, etc.)
- Modified
groups
field forUsers
andServicePrincipal
models to accept group id instead of group name - Modified
resource_key
forWorkspaceFile
,Notebook
andDirectory
- Removal of Laktory Resources Component (will trigger replacement of all resources unless aliases are used)
- Removal of resources engines classes
- Renamed
permissions
field toaccess_controls
in multiple models to be consistent with Databricks API - Renamed
vars
object tovariables
- Resources deployment method
deploy()
anddeploy_with_pulumi()
renamed toto_pulumi()
- Forced newlines character to eliminate discrepancies between Windows and Linux environment when writing pipeline files.
- job.continuous.pause_status to allow for arbitrary string (allow variable)
- job.email_notifications.pause_status to allow for arbitrary string (allow variable)
- job.task_condition.pause_status to allow for arbitrary string (allow variable)
- warehouse.channel_name to allow for arbitrary string (allow variable)
- warehouse.spot_instance_policy to allow for arbitrary string (allow variable)
- warehouse.warehouse_type to allow for arbitrary string (allow variable)
- Support for DLT tables expectations
- GitHub releases
- Units conversion spark function
- compare spark function
- API Reference
- doc tests
- SparkFuncArgs model to allow constant value
- Renamed Producer model to DataProducer
- Renamed Resource model to BaseResource
- Renamed user and permissions resources
- Renamed group and permissions resources
- Renamed pipeline resources
- Null values in joining columns with outer join
- Variable injection to support Pulumi Output as part of a string
- Column builder requires all inputs available to build a column
- Databricks directory model
- SQL Query model
- Table resource
- Git tag for each release
- Automatic version bump after each release
- Automatic documentation publishing after each release
- Renamed
table.builder.zone
totable.builder.layer
to be consistent with industry standards.
header
option when reading CSV event data sourceread_options
option when reading event data sourceaggregation
feature for table builderwindow_filter
feature for table builder
- Gold zone columns in table builder
drop_columns
option in table buildertemplate
property to table builder, allowing to select a template, independent of the zone.
- Renamed
models.sql.column.Column.to_column
tomodels.sql.column.Column.is_column
to clarify that the provided value is a column name.
- Support for externally managed users and groups
- Data Event model to support timestamp from string
- Option to exclude timestamp from data event filepath
- Selects, filter, watermark options for TableDataSource
- Support for joins in Table model
- Silver Star builder
- Refactored Table to move all building configuration into a
TableBuilder
model
- UDFs property to pipeline model
- Refactored InitScript model into the more general WorkspaceFile
- Support for CDC table data source
- Support for SCD table
- Automatic catalog and schema assignation to source table from table and pipeline
- Pyspark imports when pyspark not installed
- Column
spark_func_args
parsing - Column
spark_func_kwargs
parsing - Support for
_any
column type
schema_flat
andhas_column
DataFrame extensions for spark connect
- Table Data Source to support tables external to the DLT pipeline
- Pyspark imports when pyspark not installed
- Deployment of pipeline configuration file
- Spark optional dependencies
- Spark unit tests
- Functions library
- Support for custom functions in silver processing
- Changed API for table columns definition
- Removed databricks-sdk as dependency
df_has_column
handling of arrays when column name contains a digit
- df_has_column support for column names with backticks (`)
- Model pulumi dump method
- Support for variables in yaml files
- Deprecated metadata SQL methods
- Bronze template notebook to leverage configuration file
- Silver template notebook to leverage configuration file
- compute.Cluster model
- compute.InitScript model
- compute.Job model
- compute.Notebook model
- compute.Pipeline model
- compute.Warehouse model
- secrets.Secret model
- secrets.SecretScope model
- Pipeline configuration file management
- Refactored landing mount to landing root and changed default configuration to volumes
- User, service principal and users group models
- Grants models
- Pulumi resources engine for user, group, catalog, schema, volume and associated grants
- Renamed database objects to schema to be aligned with Databricks recommendations
- Processing method for silver tables
- Silver DLT pipeline template notebook
df_schema_flat
function_any
as as supported type for Spark functions
- Excluded fields for
DataEvent
model_dump df_hascolumn
function to support Spark 3.5
- Table Metadata included derived properties for the columns
- Refactored dlt module to support DBR < 13 and clusters in shared access mode
- model_validate_yaml() for Pipeline model
- table data insert when no data is available
- Removed spark from required dependencies
- Data Event cloud storage writers (Azure, AWS and Databricks mount)
- Data Event class
- Data Source classes
- Pipeline class
- Support for BRONZE transformations
- Initial pypi release