Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement native support StringView for CONTAINS function #12168

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

tlm365
Copy link
Contributor

@tlm365 tlm365 commented Aug 26, 2024

Which issue does this PR close?

Closes #11838.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions labels Aug 26, 2024
@tlm365 tlm365 marked this pull request as draft August 26, 2024 10:13
@tlm365 tlm365 force-pushed the contains-string-view branch from 8508473 to 88aab46 Compare August 26, 2024 15:08
@tlm365 tlm365 force-pushed the contains-string-view branch from a71d222 to 45dd141 Compare August 26, 2024 17:47
@tlm365 tlm365 marked this pull request as ready for review August 26, 2024 18:32
@Omega359
Copy link
Contributor

Omega359 commented Sep 2, 2024

I am wondering since the contains udf relies on regex (which isn't documented at all btw) whether it should be a wrapper for regexp_like and moved to the regex module? The order of the arguments is reversed and there is no flags for contains but functionally I think the two are equivalent.

@tlm365
Copy link
Contributor Author

tlm365 commented Sep 4, 2024

I am wondering since the contains udf relies on regex (which isn't documented at all btw) whether it should be a wrapper for regexp_like and moved to the regex module? The order of the arguments is reversed and there is no flags for contains but functionally I think the two are equivalent.

yeah, I also mentioned the similar of contains and regex_like that you mentioned above in #12180. But somehow I chose to implement it in string module. Let me try to implement it in regex module as your point - I also think it makes more sense.

@tlm365 tlm365 marked this pull request as draft September 6, 2024 02:39
@tlm365 tlm365 force-pushed the contains-string-view branch 4 times, most recently from ce80864 to 22f383b Compare September 6, 2024 09:59
@tlm365 tlm365 marked this pull request as ready for review September 6, 2024 10:42
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tlm365 -- sorry for the delay in reviewing. Overall this PR looks very cleanly implemented and is easy to read 👌

I had a few questions, but it looks pretty close to me

Thanks again

/// See the documentation [here](https://docs.rs/regex/1.5.4/regex/#grouping-and-flags)
/// for more information.
///
/// It is inspired / copied from `regexp_is_match_utf8` [arrow-rs].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we need to haev our own copy here? Is it because the arrow kernel doesn't support StringView yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the arrow kernel doesn't support StringView yet?

Yes, you're right. I intend to update this one in upstream [arrow-rs]. After that, we can eliminate it from this location.

@@ -52,9 +52,9 @@ encoding_expressions = ["base64", "hex"]
# enable math functions
math_expressions = []
# enable regular expressions
regex_expressions = ["regex"]
regex_expressions = ["regex", "string_expressions"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for this change (which seems to basically make regex_expressions and string_expressions the same feature flag)

I see you added datafusion/functions/src/regex/regexp_common.rs, but that seems only to be used by string functions. Perhaps it would be beter placed somewhere in functions/src/ 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for reviewing 🙏

Perhaps it would be beter placed somewhere in functions/src/

oh, I agree, will update it soon

@tlm365 tlm365 force-pushed the contains-string-view branch 2 times, most recently from f1bff73 to 6267b8d Compare September 7, 2024 00:41
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tlm365 -- looks great to me

@alamb alamb merged commit c71a9d7 into apache:main Sep 10, 2024
27 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 10, 2024

Thanks again @tlm365

@tlm365 tlm365 deleted the contains-string-view branch November 10, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update CONTAINS scalar function to support Utf8View
3 participants