Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hint about expected extension in error message in register_csv,… #14168

Conversation

cj-zhukov
Copy link
Contributor

@cj-zhukov cj-zhukov commented Jan 17, 2025

Add a hint about expected extension in error message in register_parquet, register_json, register_avro (#14144)

Which issue does this PR close?

Closes #14144.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

… register_parquet, register_json, register_avro (apache#14144)
@github-actions github-actions bot added the core Core DataFusion crate label Jan 17, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cj-zhukov -- this looks great and I think a much better experience than returning an empty dataframe

The only thing I think this PR is missing is some tests to ensure we don't accidentally break the feature in the future

I made a proposal here if that is helpful to you

extension: impl AsRef<str>,
) -> Result<()> {
let table_paths = table_paths.to_urls()?;
if table_paths.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what if the folder contains mixed parquet files and some .txt will that error trigger? That would be not very expected behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this good point! I checked this. If folder contains mixed format files, that will not error trigger!
I agree that if it failed, it would not be expected behavior. This code helps to prevent this. I found it in _read_type (https://docs.rs/datafusion/latest/src/datafusion/execution/context/mod.rs.html#1261)

  if !file_path.ends_with(extension) && !path.is_collection() {
      return exec_err!(
          "File path '{file_path}' does not match the expected extension '{extension}'"
      );
  }

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

I pushed a commit to this branch to fix the cargo fmt 🤦

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cj-zhukov and @comphead

I think this one is good to go now

…in-register-csv-parquet-json' of github.com:cj-zhukov/datafusion into cj-zhukov/Add-a-hint-about-expected-extension-in-error-in-register-csv-parquet-json
@cj-zhukov
Copy link
Contributor Author

@alamb Andrew, my apologies, I haven't noticed your commit for fixing cargo fmt

…about-expected-extension-in-error-in-register-csv-parquet-json
@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

No worries! That was my bad for making a PR to your branch that I forgot to run cargm fmt. I also merged up again to main to resolve a conflict in this one

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

We are very close to getting this merged. I expect it iwll be ready to go once CI completes again

@alamb alamb merged commit e9a77e0 into apache:main Jan 18, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

Woohoo! Thanks again @cj-zhukov and @comphead

@cj-zhukov cj-zhukov deleted the cj-zhukov/Add-a-hint-about-expected-extension-in-error-in-register-csv-parquet-json branch January 19, 2025 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a hint about expected extension in error message in register_csv, register_parquet, register_json
3 participants