Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-859832: DataFrame.agg() duplicates columns #42

Open
martin-frydl opened this issue Jul 10, 2023 · 2 comments
Open

SNOW-859832: DataFrame.agg() duplicates columns #42

martin-frydl opened this issue Jul 10, 2023 · 2 comments

Comments

@martin-frydl
Copy link

With following Java code:

Updatable latestIdTable = session.table("test_table");
DataFrame latestIdQuery = latestIdTable.where(latestIdTable.col("status").equal_to(functions.lit("ready")))
                .groupBy(new Column[] {latestIdTable.col("name")})
                .agg(new Column[] {latestIdTable.col("name"), functions.max(latestIdTable.col("id")).as("LATEST_ID")});
latestIdQuery.explain();

I get query:

SELECT
  "NAME", <------ here the column is doubled
  "NAME",
  max("ID") AS "LATEST_ID"
FROM  ( SELECT * FROM (SELECT * FROM (test_table)) WHERE ("STATUS" = 'ready') )
GROUP BY "NAME"

Snowpark version 1.6.2 works fine, "NAME" is present only once.

@github-actions github-actions bot changed the title DataFrame.agg() duplicates columns SNOW-859832: DataFrame.agg() duplicates columns Jul 10, 2023
@sfc-gh-jfreeberg
Copy link
Collaborator

@martin-frydl -- You mentioned this behavior doesn't occur in v1.6.2. Which version are you using in this code sample? 1.8.0?

@martin-frydl
Copy link
Author

The code is the same in both versions but the resulting query is from 1.8.0. When run in 1.6.2, the query will have one "NAME" replaced by autogenerated name as a result of rename of conflicting column name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants