Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when drop a non-Iceberg table , the directory associated with the table was not deleted #11820

Open
1 of 3 tasks
lordk911 opened this issue Dec 19, 2024 · 4 comments
Open
1 of 3 tasks
Labels
bug Something isn't working

Comments

@lordk911
Copy link

lordk911 commented Dec 19, 2024

Apache Iceberg version

1.6.1

Query engine

Spark

Please describe the bug 🐞

spark 3.4.4 with config:

spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive

when drop a orc table, the directory associated with the table was not deleted from hdfs, only delete metadata from HMS

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@lordk911 lordk911 added the bug Something isn't working label Dec 19, 2024
@lordk911 lordk911 reopened this Dec 22, 2024
@lordk911
Copy link
Author

Is this a Bug, or is it by design?

@MonkeyCanCode
Copy link

@lordk911 that is expected. Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add purge at the end. This is documented in https://iceberg.apache.org/docs/latest/spark-ddl/#drop-table.

You can find the same from the code as well:

Entry point for drop table in spark: https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java#L284
Then it goes to

public boolean dropTable(Identifier ident) {
which will tell us the following:

  @Override
  public boolean dropTable(Identifier ident) {
    return dropTableWithoutPurging(ident);
  }

and the content of dropTableWithoutPurging is following which also shows by default drop table xxx in spark won't remove data files but only metadata refs:

  private boolean dropTableWithoutPurging(Identifier ident) {
    if (isPathIdentifier(ident)) {
      return tables.dropTable(((PathIdentifier) ident).location(), false /* don't purge data */);
    } else {
      return icebergCatalog.dropTable(buildIdentifier(ident), false /* don't purge data */);
    }
  }

@lordk911
Copy link
Author

lordk911 commented Jan 3, 2025

Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add purge at the end. This is documented in https://iceberg.apache.org/docs/latest/spark-ddl/#drop-table.

Thank you for your reply. I know about this change, but in fact, spark3.3.3 + iceberg 1.3.1, when using SparkSessionCatalog to execute the drop non-Iceberg table, purge keyword is not required.
If this is expected by design , it is acceptable for me.

@MonkeyCanCode
Copy link

MonkeyCanCode commented Jan 3, 2025

Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add purge at the end. This is documented in https://iceberg.apache.org/docs/latest/spark-ddl/#drop-table.

Thank you for your reply. I know about this change, but in fact, spark3.3.3 + iceberg 1.3.1, when using SparkSessionCatalog to execute the drop non-Iceberg table, purge keyword is not required. If this is expected by design , it is acceptable for me.

I haven't try this on older version of iceberg runtime. The one I currently using is spark 3.5.x + iceberg runtime 1.6.x and drop table without purge won't cleanup datafiles there. I would think that is by designed based on the doc. Wondering if some other bugs is causing data files to get drop in spark3.3.3 + iceberg 1.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants