Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark mask function #10264

Closed

Conversation

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 20, 2024
Copy link

netlify bot commented Jun 20, 2024

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit d37cb88
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66a782fc72187b00085aa288
😎 Deploy Preview https://deploy-preview-10264--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@gaoyangxiaozhu gaoyangxiaozhu changed the title [VL] [WIP] Spark mask function support [VL] Spark mask function support Jun 26, 2024
@gaoyangxiaozhu
Copy link
Contributor Author

@rui-mo / @PHILO-HE please help review.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several comments.

velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
@gaoyangxiaozhu gaoyangxiaozhu requested a review from rui-mo June 27, 2024 12:25
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several comments.

velox/functions/sparksql/CMakeLists.txt Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
@gaoyangxiaozhu
Copy link
Contributor Author

hey @rui-mo i saw the spark fuzz test fail with below user error - i remember you mentioned that the user error would be catched ?
image

@rui-mo
Copy link
Collaborator

rui-mo commented Jul 3, 2024

@gaoyangxiaozhu This failure is not about the user error itself, but about different behaviors being detected for the simplified and common eval engines. Probably the bug is on the different handling of constant encoding and dictionary encoding, and you could try and debug it locally.
https://github.com/facebookincubator/velox/blob/main/velox/expression/fuzzer/FuzzerToolkit.cpp#L95-L96

@gaoyangxiaozhu
Copy link
Contributor Author

@gaoyangxiaozhu This failure is not about the user error itself, but about different behaviors being detected for the simplified and common eval engines. Probably the bug is on the different handling of constant encoding and dictionary encoding, and you could try and debug it locally. https://github.com/facebookincubator/velox/blob/main/velox/expression/fuzzer/FuzzerToolkit.cpp#L95-L96

I see, will check later

@gaoyangxiaozhu
Copy link
Contributor Author

@rui-mo for help new round review. I have run locally fuzz test 5 minutes and it work run.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The runtime code looks good to me. Will take a further look on tests.

velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/Mask.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/MaskTest.cpp Outdated Show resolved Hide resolved
@gaoyangxiaozhu gaoyangxiaozhu requested a review from rui-mo July 10, 2024 06:08
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me % we may need to confirm the behavior of Spark on wide-width characters and invalid UTF-8 characters. Thanks.

velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
@gaoyangxiaozhu
Copy link
Contributor Author

@PHILO-HE / @mbasmanova for any other comments

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
// If the provided nth argument is NULL, the related original character is
// retained.
template <typename T>
struct MaskFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to provide fast path for ASCII inputs?

Copy link
Contributor Author

@gaoyangxiaozhu gaoyangxiaozhu Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm.. looks leverage callASCII have benefit but not much since we still need handle replacement char args non ASCII cases.
Create a issue to do in seperate PR if needed. #10546

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova It seems only callAscii is provided, while a function like callAsciiNullable is needed here. Do you think we need to add that? Thanks.

velox/functions/sparksql/String.h Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/StringTest.cpp Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Show resolved Hide resolved
@gaoyangxiaozhu
Copy link
Contributor Author

@mbasmanova for any new comments ?

@gaoyangxiaozhu
Copy link
Contributor Author

@mbasmanova for any new comments ?

ping @mbasmanova again, let help speed up the review process

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks.

velox/functions/sparksql/MaskFunction.h Outdated Show resolved Hide resolved
@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jul 29, 2024
@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in 73ca922.

Copy link

Conbench analyzed the 1 benchmark run on commit 73ca9227.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark mask function support
5 participants