Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Datafusion 43 #905

Merged
merged 19 commits into from
Nov 10, 2024
Merged

Conversation

Michael-J-Ward
Copy link
Contributor

@Michael-J-Ward Michael-J-Ward commented Oct 10, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Python Changes:

  • Removed PyLimit methods for skip and fetch because upstream Limit now returns expressions instead of constants
  • Crossjoin expression was removed upstream

TODO

  • SmoothTwoColumn test (the evaluate_all python method is only receiving the a column).

@Michael-J-Ward
Copy link
Contributor Author

For context on the SmoothTwoColumn test.

On this branch, the values argument passed into SmoothTwoColumn.evaluate_all

> /home/mike/workspace/datafusion-python/dev/python/tests/test_udwf.py(143)evaluate_all()
-> results = []
(Pdb) values
[<pyarrow.lib.Int64Array object at 0x7fff8ec772e0>
[
  0,
  1,
  2,
  3,
  4,
  5,
  6
]]

On main, it contains both a and b columns:

> /home/mike/workspace/datafusion-python/main/python/tests/test_udwf.py(143)evaluate_all()
-> values_a = values[0]
(Pdb) values
[<pyarrow.lib.Int64Array object at 0x7fffe8dcb700>
[
  0,
  1,
  2,
  3,
  4,
  5,
  6
], <pyarrow.lib.Int64Array object at 0x7fffe8dcbd60>
[
  7,
  4,
  3,
  8,
  9,
  1,
  6
]]
(Pdb) c

@timsaucer
Copy link
Contributor

Do you need help troubleshooting the test?

@Michael-J-Ward
Copy link
Contributor Author

@timsaucer - yes, if you wouldn't mind.

Here's where I'm at.

MulticolumnWindowUDF::partition_evaluator gets called with the correct arguments

RUST: evaluating partition_evaluator_args: PartitionEvaluatorArgs { input_exprs: [Column { name: "a", index: 0 }, Column { name: "b", index: 1 }], input_types: [Int64, Int64], is_reversed: false, ignore_nulls: false }

But then RustPartitionEvaluator::evaluate_all only receives the one array

RUST: evaluating evaluate_all with 1 value arrays: [PrimitiveArray<Int64>
[
  0,
  1,
  2,
  3,
  4,
  5,
  6,
]]

My next step would be to dig deeper into the upstream machinery, so I'd appreciate if you could take a quick sanity check before that.

@timsaucer
Copy link
Contributor

Ok, I think this is ready to merge. @Michael-J-Ward do you want to look over the changes I added in to account for the new string views?

@timsaucer timsaucer marked this pull request as ready for review November 10, 2024 12:09
@Michael-J-Ward
Copy link
Contributor Author

@timsaucer - thanks for finishing this up!

I think it's good to go. Especially since this 1st upgrade PR is primarily for the dependencies.

@Michael-J-Ward Michael-J-Ward changed the title WIP: Upgrade to Datafusion 43 Upgrade to Datafusion 43 Nov 10, 2024
@Michael-J-Ward Michael-J-Ward merged commit 3c66201 into apache:main Nov 10, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants