Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support regex delimiter and limit argument for Spark split function #10248
Support regex delimiter and limit argument for Spark split function #10248
Changes from 43 commits
4e0f895
eada804
a9ec3de
eddfb43
f7f0967
0348181
47542dd
553187b
e69ea54
f5c603a
54a1bce
dd3d805
cfc700b
c42ec96
ea214f9
27fcfab
d13e03e
9e2dbe9
99c8e87
5da18db
64c2282
6363c73
ce89291
edb1e70
1a8e0b4
9eb75d3
4aff4aa
9aff221
3845cce
2151c17
526a2d6
d7aa8b7
9622d96
948e01f
355200f
ded360a
fdc8c87
dbb9f8a
0ad32d0
5197087
0db194e
a2fc545
26e060f
e43c558
7682d2d
4ad8f15
2a399d7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why do we need this comment? Isn't it true for all functions that they are expected to match Spark 3.4+ behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case people see different behavior happen for spark version below 3.4 as 3.2/3.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other functions? Do they implement Spark 3.2/3.3 semantics? CC: @rui-mo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaoyangxiaozhu @rui-mo gentle ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova For the split function, the behavior for empty delimiter was changed since Spark 3.4 with commit apache/spark@247306c.
When implementing functions in Velox, if there is semantic difference among Spark versions, I assume we need to follow the latest one (Spark 3.5). Does it makes sense? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Rui, thank you for clarifying. I assume this is a general policy that applies to all functions. If so, it would be helpful to document it in https://facebookincubator.github.io/velox/spark_functions.html and clarify which Spark version we match and whether we match ANSI mode on or off.
Furthermore, assuming Spark doesn't guarantee backwards compatibility across versions, we would have to pick a specific version we match and cannot say "latest" or 3.4+. If Spark changes behavior in "latest" version and we change to match, existing users will start seeing different behavior. Is this the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Thanks for the pointer. I'd like to document it.
I assume it is the case, and it makes sense to me to pick a specific version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #10677 for the documentation.
@gaoyangxiaozhu I assume below statement and
Since Spark 3.4
forsplitEmptyDelimiter
function could be removed.