-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: upgrade to datafusion 43 #2886
Conversation
We need Datafusion to upgrade as well before we can do this. I have some patches to make the latest datafusion work but we're blocked on them to adopt the newest arrow and kernel |
9d686df
to
d4bfacd
Compare
Hi, datafusion has been released. Let's rock! |
Thank you @ion-elgreco -- let me know if there is anything I can to do help with this PR (e.g. I could work on updating the code to use non deprecated APIs in the core for parquet statistics...) |
8db59c2
to
b325e27
Compare
@alamb there seems to be a regression in DF, see https://github.com/delta-io/delta-rs/actions/runs/10959345721/job/30431504526?pr=2886#step:7:948. A literal with a list is creating a dtype list with inner type using the name "item", we however have our data internally read according to the parquet spec, so that when we write the parquet files are meeting the field naming of the spec. I had a quick glance at the Changelog but can't really pinpoint where this change may have occurred, do you have any ideas?
|
It sounds somewhat similar to a discussion happening here about nullability (but I think field name is similar in terms of how it is treated): apache/datafusion#11989 (comment) Update; filed apache/datafusion#12560 to track the regression more |
f824d7a
to
a1a33ce
Compare
@alamb thanks for following up! I feel a little sheepish 🐑 insofar that my internal CI showed these errors weeks ago in my integration testing between datafusion I've subscribed to the issue you created and can help testing if necessary! Depending on my availability this weekend I can try fixing it in datafusion too 🤔 |
Hi there - I believe a fix was implemented in datafusion and this PR is not blocked any longer. |
It's not yet released though |
I can confirm that the latest datafusion main passes tests in CI 🥳 |
4f30e16
to
bac3306
Compare
I filed apache/datafusion#12813 to consider releasing a patch version of DataFusion |
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
We plan to make another DataFusion patch release to unblock this upgrade: |
6721672
to
08a96ed
Compare
Today the make_array function from Datafusion uses "item" as the list element's field name. With recent changes in delta-kernel-rs we have switched to calling it "element" which is more conventional related to how Apache Parquet handles things This change introduces a test which helps isolate the behavior seen in Python tests within the core crate for easier regression testing Signed-off-by: R. Tyler Croy <[email protected]>
08a96ed
to
b144cf1
Compare
👋 hey there -- we are starting to get ready for the DataFusion 44 release and are trying to make sure it is less disruptive to delta-rs than this past release @rtyler Would it be ok if we started working on the upgrade now before the DataFusion release to test and find any potential breakages before? Maybe we can make a WIP PR to this repo with a crates.io pin (similar to what i am doing in DataFusion with arrow-rs): If this sounds ok, I can start working on that PR |
hey @alamb - that would not only be OK, but a great thing to do :). Happy to support if needed along the way. |
Cool -- thanks -- I'll give it a try hopefully in the next day or two. I am sure I will need help with the python stuff |
I haven't made any actual progress on this, but I have created a ticket to track the work: |
@alamb I had not seen this issue last week, sorry! I maintain a branch for customers with datafusion main on it, I will cherry-pick and queue up the delta-rs modifications I have shortly |
No worries at all -- thank you 🙏 I was actually about to start work on trying to upgrade DataFuson in delta.rs this morning (I have a few other things to do first) but if you have already done the work that is amazing. Thank you!!!! |
😍 |
Description
Bump kernel