-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hint about expected extension in error message in register_csv,… #14168
Add a hint about expected extension in error message in register_csv,… #14168
Conversation
… register_parquet, register_json, register_avro (apache#14144)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cj-zhukov -- this looks great and I think a much better experience than returning an empty dataframe
The only thing I think this PR is missing is some tests to ensure we don't accidentally break the feature in the future
I made a proposal here if that is helpful to you
extension: impl AsRef<str>, | ||
) -> Result<()> { | ||
let table_paths = table_paths.to_urls()?; | ||
if table_paths.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what if the folder contains mixed parquet
files and some .txt
will that error trigger? That would be not very expected behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this good point! I checked this. If folder contains mixed format files, that will not error trigger!
I agree that if it failed, it would not be expected behavior. This code helps to prevent this. I found it in _read_type (https://docs.rs/datafusion/latest/src/datafusion/execution/context/mod.rs.html#1261)
if !file_path.ends_with(extension) && !path.is_collection() {
return exec_err!(
"File path '{file_path}' does not match the expected extension '{extension}'"
);
}
Add tests for error
I pushed a commit to this branch to fix the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cj-zhukov and @comphead
I think this one is good to go now
…in-register-csv-parquet-json' of github.com:cj-zhukov/datafusion into cj-zhukov/Add-a-hint-about-expected-extension-in-error-in-register-csv-parquet-json
@alamb Andrew, my apologies, I haven't noticed your commit for fixing cargo fmt |
…about-expected-extension-in-error-in-register-csv-parquet-json
No worries! That was my bad for making a PR to your branch that I forgot to run cargm fmt. I also merged up again to main to resolve a conflict in this one |
We are very close to getting this merged. I expect it iwll be ready to go once CI completes again |
Woohoo! Thanks again @cj-zhukov and @comphead |
Add a hint about expected extension in error message in register_parquet, register_json, register_avro (#14144)
Which issue does this PR close?
Closes #14144.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?