Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50582][SQL][PYTHON] Add quote builtin function #49191

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Dec 15, 2024

What changes were proposed in this pull request?

This PR proposes to add quote builtin function, which encloses the given string by single quotes and prepends a backslash to each instance of single quote in the string.

The implementation and behavior are same as Hive's ones.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27362046#LanguageManualUDF-StringFunctions

https://github.com/apache/hive/blob/3af4517eb8cfd9407ad34ed78a0b48b57dfaa264/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFQuote.java#L54

Why are the changes needed?

For interoperability with Hive.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@xinrong-meng
Copy link
Member

nit: How about adding [PYTHON] to the PR title?

@sarutak sarutak changed the title [SPARK-50582][SQL] Add quote builtin function [SPARK-50582][SQL][PYTHON] Add quote builtin function Dec 16, 2024
| org.apache.spark.sql.catalyst.expressions.Inline | inline | SELECT inline(array(struct(1, 'a'), struct(2, 'b'))) | struct<col1:int,col2:string> |
| org.apache.spark.sql.catalyst.expressions.Inline | inline_outer | SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b'))) | struct<col1:int,col2:string> |
| org.apache.spark.sql.catalyst.expressions.InlineExpressionBuilder | inline | SELECT inline(array(struct(1, 'a'), struct(2, 'b'))) | struct<col1:int,col2:string> |
| org.apache.spark.sql.catalyst.expressions.InlineExpressionBuilder | inline_outer | SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b'))) | struct<col1:int,col2:string> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to this PR?

Copy link
Member Author

@sarutak sarutak Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Hmm, maybe #48503 should have re-generated the golden file right?
If so, I'll remove this irrelevant change manually, and open another PR for followup.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you. Let's handle them independently.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @sarutak . Could you rebase this PR once more since the last test was December.

And, can we remove the irrelevant parts from this file? for example Inline and PodExplode?

  • sql/core/src/test/resources/sql-functions/sql-expression-schema.md

Comment on lines 3733 to 3734
_FUNC_(str) - Returns `str` enclosed by single quotes and
each instance of single quote in it is preceded by a backslash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to leave one string because the new line character occurs in docs.

with UnaryLike[Expression] {
override def nullIntolerant: Boolean = true

override lazy val replacement: Expression = Invoke(input, "quote", input.dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use StaticInvoke?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for instance, implementing quote in ExpressionImplUtils and call it using StaticInvoke right?

Comment on lines 1463 to 1470
checkAnswer(
df.selectExpr("quote('Spark')"),
Row("'Spark'")
)

checkAnswer(
df.selectExpr("quote(NULL)"),
Row(null))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicates of checks in string-functions.sql?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think string-functions.sql is enough.
How about just removing the duplicates here?

@sarutak sarutak force-pushed the quote-builtin-function branch from 5c7afd9 to 230a967 Compare January 23, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants