Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51050] [SQL] Add group by alias tests to the group-by.sql #49750

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihailoale-db
Copy link
Contributor

What changes were proposed in this pull request?

I propose that we extend group-by.sql with some cases where we group byaliases.

Why are the changes needed?

Extend the testing coverage.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jan 31, 2025
@@ -64,6 +78,10 @@ set spark.sql.groupByAliases=false;

-- Check analysis exceptions
SELECT a AS k, COUNT(b) FROM testData GROUP BY k;
SELECT 1 GROUP BY `1`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to add some tests that should fail (with set spark.sql.groupByAliases=false;). I can remove them if needed

SELECT 1 AS a FROM testData GROUP BY `a`;

-- GROUP BY implicit alias
SELECT 1 GROUP BY `1`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT 1 GROUP BY `1`;
SELECT 1 GROUP BY `1`;
-- GROUP BY alias with the subquery name
SELECT (SELECT a FROM testData LIMIT 1) + (SELECT b FROM testData LIMIT 1) FROM VALUES (1, 2) GROUP BY `(SELECTaFROMtestDataLIMIT1)+(SELECTbFROMtestDataLIMIT1)`
-- GROUP BY with expression subqueries
SELECT a, count(*) FROM testData GROUP BY (SELECT b FROM testData)
SELECT a, count(*) FROM testData GROUP BY a, (SELECT b FROM testData)
SELECT a, count(*) FROM testData GROUP BY a, (SELECT b FROM testData LIMIT 1)
SELECT a, count(*) FROM testData GROUP BY a, b IN (SELECT a FROM testData)
SELECT a, count(*) FROM testData GROUP BY a, a IN (SELECT b FROM testData)
SELECT a, count(*) FROM testData GROUP BY a, EXISTS(SELECT b FROM testData)

-- !query analysis
org.apache.spark.sql.catalyst.ExtendedAnalysisException
{
"errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to add some tests that should fail (with set spark.sql.groupByAliases=false;). I can remove them if needed

@beliefer
Copy link
Contributor

beliefer commented Feb 2, 2025

Why you add these test cases here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants