[dagster-databricks]: Add support for serverless jobs #27558

SoerenStahlmann · 2025-02-04T19:14:41Z

What's the use case?

Starting serverless jobs greatly reduces the spin-up time for jobs and leads to faster execution. The current implementation of PipesDatabricksClient does not support the execution of jobs on Databricks serverless clusters.

Ideas of implementation

A serverless job doesn’t have either an existing_cluster_id or new_cluster key defined during the SubmitTask creation. If both of these fields are empty, the WorkspaceClient.jobs.submit method will initiate a serverless run. Since serverless runs do not support adding libraries, the dagster-pipes dependency can only be added using the environment key. The currently used databricks-sdk version does not support this property and would need to be updated to the latest version.

During the enrichment of the submit task the method currently expects either existing_cluster_id or new_cluster to be defined. This needs to be changes, so the else case covers the serverless implementation.

Since serverless compute is based on Shared access mode we need a VolumeContextInjector / Reader / MessageWriter / ... instead of using DBFS for communication between the client and compute. Since reading from DBFS is generally problematic on shared access mode..

Additional information

I am glad to raise a PR for this issue. If you give me some guidance on the implementation I'd be glad to contribute.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

The text was updated successfully, but these errors were encountered:

jayctran · 2025-03-01T14:56:35Z

I may be wrong and I can't remember where I read it but in terms of the databricks sdk version, I think they were looking to align it with the sdk version that dbt-databricks would pin it to: databricks/dbt-databricks#738

Edit: found the comment

dagster/python_modules/libraries/dagster-databricks/setup.py

Line 38 in b60e419

"databricks-sdk<=0.17.0", # dbt-databricks is pinned to this version

SoerenStahlmann added the type: feature-request label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dagster-databricks]: Add support for serverless jobs #27558

[dagster-databricks]: Add support for serverless jobs #27558

SoerenStahlmann commented Feb 4, 2025 •

edited

Loading

jayctran commented Mar 1, 2025 •

edited

Loading

[dagster-databricks]: Add support for serverless jobs #27558

[dagster-databricks]: Add support for serverless jobs #27558

Comments

SoerenStahlmann commented Feb 4, 2025 • edited Loading

What's the use case?

Ideas of implementation

Additional information

Message from the maintainers

jayctran commented Mar 1, 2025 • edited Loading

SoerenStahlmann commented Feb 4, 2025 •

edited

Loading

jayctran commented Mar 1, 2025 •

edited

Loading