Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

databricks - error creating elementary.dbt_columns/models/sources tables #1635

Open
jakub-auger opened this issue Jul 15, 2024 · 5 comments
Open
Assignees
Labels
Bug Something isn't working Triage 👀

Comments

@jakub-auger
Copy link

jakub-auger commented Jul 15, 2024

Describe the bug
Error thrown for dbt_columns, dbt_models and _dbt_sources table creation during first dbt run after elementary is added to the dbt project

03:53:00 Completed with 3 errors and 0 warnings:
03:53:00
03:53:00 Runtime Error in model dbt_columns (models\edr\dbt_artifacts\dbt_columns.sql)
03:53:00 [RequestId=4c2efc34-3ea5-4d1b-9afa-155f5ecae9be ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_columns' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_columns.
03:53:00
03:53:00 Runtime Error in model dbt_models (models\edr\dbt_artifacts\dbt_models.sql)
03:53:00 [RequestId=33c14b44-302b-48c9-a765-da35ae379a12 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@
.dfs.core.windows.net/elementary/dbt_models' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_models.
03:53:00
03:53:00 Runtime Error in model dbt_sources (models\edr\dbt_artifacts\dbt_sources.sql)
03:53:00 [RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

To Reproduce
Steps to reproduce the behavior:

  1. dbt run --select elementary

Expected behavior
no errors

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Elementary CLI (edr) version: [e.g. 0.5.3], can be found by running pip show elementary-data
    not installed

  • Elementary dbt package version: [e.g. 0.4.1], can be found in packages.yml file

    • package: elementary-data/elementary
      version: 0.15.2
  • dbt version you're using [e.g. 1.8.1]
    Core:

    • installed: 1.5.11
    • latest: 1.8.3 - Update available!

    Your version of dbt-core is out of date!
    You can find instructions for upgrading here:
    https://docs.getdbt.com/docs/installation

Plugins:

  • databricks: 1.5.7 - Update available!
  • spark: 1.5.3 - Update available!
  • Data warehouse [e.g. snowflake]
    azure databricks
  • Infrastructure details (e.g. operating system, prod / dev / staging, deployment infra, CI system, etc)
    azure

Additional context
This is a clean install
I'm using external tables

tried updating dbt-core and databricks, but same error

(dbt-dev) C:\git\aic_datalakehouse>dbt -v
using legacy validation callback
Core:

  • installed: 1.8.3
  • latest: 1.8.3 - Up to date!

Plugins:

  • databricks: 1.8.3 - Up to date!
  • spark: 1.8.0 - Up to date!
@jakub-auger jakub-auger added Bug Something isn't working Triage 👀 labels Jul 15, 2024
@NoyaArie
Copy link
Contributor

It looks like it failed to create these tables.

[RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

From what I see, this is a Databricks error regarding privileges: https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html

@jakub-auger
Copy link
Author

jakub-auger commented Jul 19, 2024

@NoyaArie thanks for looking into it

that's strange as I am using a single account/token (mine - admin) to run it. None of the other models in the project have problems

in fact those 3 tables (along with the rest - 24 all up?) are created

the error is referring to dbt/elementary trying to create tables that overlap the same physical location in my blob/datalake storage location

ok found it in the logs

looks like elementary is trying to create the temp/staging table in the same external location as the final table triggering the error. Allowing it would cause the final table to be overwritten with the staging data.

Are there any known workarounds?

`�[0m13:38:50.641894 [debug] [Thread-2 (]: On model.elementary.dbt_columns: /* {"app": "dbt", "dbt_version": "1.8.3", "dbt_databricks_version": "1.8.3", "databricks_sql_connector_version": "3.1.2", "profile_name": "aic_datalakehouse", "target_name": "prod", "node_id": "model.elementary.dbt_columns"} */

    create or replace table `datalakehouse`.`elementary`.`dbt_columns__tmp_20240719040850594589`
  
  using delta
  
  
  
  
  
location 'abfss://[email protected]/elementary/dbt_columns'
  
  
  as
  
    SELECT
    
        *
    
    FROM `datalakehouse`.`elementary`.`dbt_columns`
    WHERE 1 = 0

`

@ofek1weiss
Copy link
Contributor

Hey @jakub-auger , sorry for the late response... 🫤
Were you able to resolve the issue? - i think it might be related to dbt-databricks itself, and an update to it may help with that

@ofek1weiss ofek1weiss self-assigned this Sep 26, 2024
@jakub-auger
Copy link
Author

Hi @ofek1weiss
No, i haven't included elementary in my project since then

I dont see what the fix within dbt-databricks would be? it's working as designed - stopping someone from trying to save different tables in the same data location. I'd be concerned if it let it happen!

Can you explain the process of how they're created & purpose of the temp tables?
I use externally managed tables in databricks.

a 'simple' way to fix the above issue is to modify the location to include the temp table name - BUT databricks doesn't delete the raw data when an external table is dropped so i'd be left with a plethora of ./__tmp_2345i9304959 tables in my datalake

Is elementary not compatible being set up as externally managed tables in databricks?

@jakub-auger
Copy link
Author

jakub-auger commented Sep 30, 2024

@ofek1weiss
update: did not work with the latest version of dbt

Did work once i switched elementary to use managed tables. Recommend adding that somewhere to the docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Triage 👀
Projects
None yet
Development

No branches or pull requests

3 participants