Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-databricks]: Add support for serverless jobs #27558

Open
SoerenStahlmann opened this issue Feb 4, 2025 · 1 comment
Open

[dagster-databricks]: Add support for serverless jobs #27558

SoerenStahlmann opened this issue Feb 4, 2025 · 1 comment

Comments

@SoerenStahlmann
Copy link

SoerenStahlmann commented Feb 4, 2025

What's the use case?

Starting serverless jobs greatly reduces the spin-up time for jobs and leads to faster execution. The current implementation of PipesDatabricksClient does not support the execution of jobs on Databricks serverless clusters.

Ideas of implementation

A serverless job doesn’t have either an existing_cluster_id or new_cluster key defined during the SubmitTask creation. If both of these fields are empty, the WorkspaceClient.jobs.submit method will initiate a serverless run. Since serverless runs do not support adding libraries, the dagster-pipes dependency can only be added using the environment key. The currently used databricks-sdk version does not support this property and would need to be updated to the latest version.

During the enrichment of the submit task the method currently expects either existing_cluster_id or new_cluster to be defined. This needs to be changes, so the else case covers the serverless implementation.

Since serverless compute is based on Shared access mode we need a VolumeContextInjector / Reader / MessageWriter / ... instead of using DBFS for communication between the client and compute. Since reading from DBFS is generally problematic on shared access mode..

Additional information

I am glad to raise a PR for this issue. If you give me some guidance on the implementation I'd be glad to contribute.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@jayctran
Copy link

jayctran commented Mar 1, 2025

I may be wrong and I can't remember where I read it but in terms of the databricks sdk version, I think they were looking to align it with the sdk version that dbt-databricks would pin it to: databricks/dbt-databricks#738

Edit: found the comment

"databricks-sdk<=0.17.0", # dbt-databricks is pinned to this version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants