You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting serverless jobs greatly reduces the spin-up time for jobs and leads to faster execution. The current implementation of PipesDatabricksClient does not support the execution of jobs on Databricks serverless clusters.
Ideas of implementation
A serverless job doesn’t have either an existing_cluster_id or new_cluster key defined during the SubmitTask creation. If both of these fields are empty, the WorkspaceClient.jobs.submit method will initiate a serverless run. Since serverless runs do not support adding libraries, the dagster-pipes dependency can only be added using the environment key. The currently used databricks-sdk version does not support this property and would need to be updated to the latest version.
During the enrichment of the submit task the method currently expects either existing_cluster_id or new_cluster to be defined. This needs to be changes, so the else case covers the serverless implementation.
I may be wrong and I can't remember where I read it but in terms of the databricks sdk version, I think they were looking to align it with the sdk version that dbt-databricks would pin it to: databricks/dbt-databricks#738
What's the use case?
Starting serverless jobs greatly reduces the spin-up time for jobs and leads to faster execution. The current implementation of
PipesDatabricksClient
does not support the execution of jobs on Databricks serverless clusters.Ideas of implementation
A serverless job doesn’t have either an
existing_cluster_id
ornew_cluster
key defined during theSubmitTask
creation. If both of these fields are empty, theWorkspaceClient.jobs.submit
method will initiate a serverless run. Since serverless runs do not support adding libraries, thedagster-pipes
dependency can only be added using theenvironment
key. The currently used databricks-sdk version does not support this property and would need to be updated to the latest version.During the enrichment of the submit task the method currently expects either
existing_cluster_id
ornew_cluster
to be defined. This needs to be changes, so the else case covers the serverless implementation.Since serverless compute is based on Shared access mode we need a VolumeContextInjector / Reader / MessageWriter / ... instead of using DBFS for communication between the client and compute. Since reading from DBFS is generally problematic on shared access mode..
Additional information
I am glad to raise a PR for this issue. If you give me some guidance on the implementation I'd be glad to contribute.
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: