Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for Utf8View and BinaryView types #10925

Merged
merged 5 commits into from
Jun 17, 2024

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Closes #10920 .

But we need to wait apache/arrow-rs#5894 to merge and propagate to DataFusion, otherwise we get error from arrow.

Different to the original issue, I think we mean Utf8View rather than StringView for the DataType name (consistent with those in Arrow DataTypes)

Rationale for this change

Now you can cast utf8 to view types

     Running `target/debug/datafusion-cli -c 'select arrow_typeof(arrow_cast('\''foo'\'', '\''Utf8View'\''));'`
DataFusion CLI v39.0.0
+--------------------------------------------------------+
| arrow_typeof(arrow_cast(Utf8("foo"),Utf8("Utf8View"))) |
+--------------------------------------------------------+
| Utf8View                                               |
+--------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.003 seconds.
 Running `target/debug/datafusion-cli -c 'select arrow_cast('\''foo'\'', '\''Utf8View'\'');'`
DataFusion CLI v39.0.0
+------------------------------------------+
| arrow_cast(Utf8("foo"),Utf8("Utf8View")) |
+------------------------------------------+
| foo                                      |
+------------------------------------------+
1 row(s) fetched. 
Elapsed 0.003 seconds.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label Jun 15, 2024
@alamb alamb changed the title Add initial support for view types Add initial support for Utf8View and BinaryView types Jun 15, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XiangpengHao -- this looks great ❤️

Can you please add a test (which will error) to https://github.com/apache/datafusion/blob/cb9068ce56a6870b84c4878db0255048e7df9e97/datafusion/sqllogictest/test_files/arrow_typeof.slt so there is some test coverage

But we need to wait apache/arrow-rs#5894 to merge and propagate to DataFusion, otherwise we get error from arrow.

This is an excellent point in general - that if DataFusion needs additional changes in arrow we will either

  1. Have to wait
  2. Put stubs in DataFusion that have the appropriate functionaltiy until they are available in arrow

For example, in this case, we could implement a cast function in DataFusion that handles stringview casting otherwise falls back to the arrow kernel

This is probably something we'll have to sort out 🤔

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

After some more thought about this, I think the best approach will be to create a "feature" branch into which we will merge the StringView features until a new version of arrow-rs is released.

In order to avoid that branch diverging too much (and thus being hard to maintain) I think we should get this PR patched up and ready to merge (as it can go in without any more arrow-rs support and the risk of conflicts is high)

Then I'll make a branch in the DataFusion repo to hold all the StringView related code

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

I have this PR checked out locally so I'll make some small changes to get the tests passing

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 17, 2024
@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

I filed #10961 to track making a feature branch

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is a good first step -- thank you @XiangpengHao

@alamb alamb merged commit f373a86 into apache:main Jun 17, 2024
23 checks passed
@XiangpengHao XiangpengHao deleted the view-types branch June 17, 2024 19:33
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* add view types

* Add slt tests

* comment out failing test

* update vendored code

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement arrow_cast support for StringView and BinaryView
2 participants