-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark mask function #10264
Add Spark mask function #10264
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
747c385
to
95b679a
Compare
90b86f0
to
69a5717
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added several comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added several comments.
hey @rui-mo i saw the spark fuzz test fail with below user error - i remember you mentioned that the user error would be catched ? |
@gaoyangxiaozhu This failure is not about the user error itself, but about different behaviors being detected for the simplified and common eval engines. Probably the bug is on the different handling of constant encoding and dictionary encoding, and you could try and debug it locally. |
I see, will check later |
@rui-mo for help new round review. I have run locally fuzz test 5 minutes and it work run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The runtime code looks good to me. Will take a further look on tests.
… into gayangya/spark_mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me % we may need to confirm the behavior of Spark on wide-width characters and invalid UTF-8 characters. Thanks.
@PHILO-HE / @mbasmanova for any other comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
velox/functions/sparksql/String.h
Outdated
// If the provided nth argument is NULL, the related original character is | ||
// retained. | ||
template <typename T> | ||
struct MaskFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to provide fast path for ASCII inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm.. looks leverage callASCII
have benefit but not much since we still need handle replacement char
args non ASCII cases.
Create a issue to do in seperate PR if needed. #10546
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova It seems only callAscii
is provided, while a function like callAsciiNullable
is needed here. Do you think we need to add that? Thanks.
1f1c493
to
2f72c72
Compare
2f72c72
to
fd0a41c
Compare
@mbasmanova for any new comments ? |
ping @mbasmanova again, let help speed up the review process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks.
… into gayangya/spark_mask
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
A function returns a masked version of the input string.
Spark documentation: https://spark.apache.org/docs/latest/api/sql/#mask
Spark implementation: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala#L103
Spark tests: https://github.com/apache/spark/blob/0db5bdecfa6cbfff1be7690bb783a858989776b9/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala#L5677
Fixes #10263