Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for StringView, merge changes from string-view development branch #11402

Merged
merged 9 commits into from
Jul 16, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 10, 2024

Which issue does this PR close?

Part of #10918

Rationale for this change

While we were initially developing StringView in DataFusion the required features in parquet/arrow were not yet available and so we made a feature branch pinned to the version of arrow required: https://github.com/apache/datafusion/commits/string-view/

The required functionality for StringView has been releaed in arrow/parquet 51.1.0 and DataFusion now uses that version as of #11302

Thus there is no reason for a feature branch so let's bring the changes to main

What changes are included in this PR?

Merge in all the changes from @Weijun-H @PsiACE and @XiangpengHao on the string-view branch. Specifically:

Are these changes tested?

Yes, there are unit tests

Are there any user-facing changes?

Not yet, but there will be

alamb and others added 6 commits June 18, 2024 07:19
* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo
…10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci
* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly
…11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <[email protected]>

* chore: make fmt happy

Signed-off-by: Chojan Shang <[email protected]>

---------

Signed-off-by: Chojan Shang <[email protected]>
…BinaryView (#11034)

* implement large binary

* add tests for large string

* better comments for string coercion
* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jul 10, 2024
@alamb alamb changed the title String view Initial support for StringView, merge changes from string-view development branch Jul 10, 2024
@alamb alamb marked this pull request as ready for review July 10, 2024 21:26
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb
Copy link
Contributor Author

alamb commented Jul 16, 2024

@thinkharderdev or @avantgardnerio -- would either of you have a few moments to review this PR? We want to merge the work we have so far for StringView into main (that depends on arrow 51.1.0) and then move on to the additional work that will require 52.2 (e.g. #11421)

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alamb
Copy link
Contributor Author

alamb commented Jul 16, 2024

Thank you @thinkharderdev

cc @XiangpengHao I will now make a string-view2 branch for your next sequence of PRs

@alamb alamb merged commit ccb4baf into main Jul 16, 2024
48 checks passed
@alamb alamb deleted the string-view branch July 16, 2024 19:52
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <[email protected]>

* chore: make fmt happy

Signed-off-by: Chojan Shang <[email protected]>

---------

Signed-off-by: Chojan Shang <[email protected]>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <[email protected]>
Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Chojan Shang <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <[email protected]>

* chore: make fmt happy

Signed-off-by: Chojan Shang <[email protected]>

---------

Signed-off-by: Chojan Shang <[email protected]>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <[email protected]>
Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Chojan Shang <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
wiedld pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 31, 2024
…velopment branch (apache#11402)

* Update `string-view` branch to arrow-rs main (apache#10966)

* Pin to arrow main

* Fix clippy with latest arrow

* Uncomment test that needs new arrow-rs to work

* Update datafusion-cli Cargo.lock

* Update Cargo.lock

* tapelo

* feat: Implement equality = and inequality <> support for StringView (apache#10985)

* feat: Implement equality = and inequality <> support for StringView

* chore: Add tests for the StringView

* chore

* chore: Update tests for NULL

* fix: Used build_array_string!

* chore: Update string_coercion function to handle Utf8View type in binary.rs

* chore: add tests

* chore: ci

* Add more StringView comparison test coverage (apache#10997)

* Add more StringView comparison test coverage

* add reference

* Add another test showing casting on columns works correctly

* feat: Implement equality = and inequality <> support for BinaryView (apache#11004)

* feat: Implement equality = and inequality <> support for BinaryView

Signed-off-by: Chojan Shang <[email protected]>

* chore: make fmt happy

Signed-off-by: Chojan Shang <[email protected]>

---------

Signed-off-by: Chojan Shang <[email protected]>

* Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034)

* implement large binary

* add tests for large string

* better comments for string coercion

* Improve filter predicates with `Utf8View` literals (apache#11043)

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Improve type coercion logic in TypeCoercionRewriter

* chore

* chore: Update test

* refactor: Improve type coercion logic in TypeCoercionRewriter

* refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs

* Remove arrow-patch

---------

Signed-off-by: Chojan Shang <[email protected]>
Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Chojan Shang <[email protected]>
Co-authored-by: Xiangpeng Hao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants