Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1803811: Allow mixed-case field names for struct type columns #2640

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-jrose
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1803811

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    This PR addresses an issue with StructType columns where names for StructFields are always uppercased. Snowflake always uppercases column names, but fields that are internal to StructType columns do not have casing enforced. Ideally we would not treat StructFields as column objects, but separating out that responsibility would require a BCR.

This change gets around the BCR requirement by additionally storing the unmodified version of a field name inside of the ColumnIdentifier class and using that when generating the sql for StructType columns instead of the column formatted name.

@sfc-gh-jrose sfc-gh-jrose marked this pull request as ready for review November 16, 2024 00:32
@sfc-gh-jrose sfc-gh-jrose requested review from a team as code owners November 16, 2024 00:32
CHANGELOG.md Outdated
Comment on lines 31 to 32
- Added support for mixed case field names in struct type columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this only affected structured types or all StructTypes? If it is the latter, then it would be a bcr, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only affects structured types because currently semi-structured objects don't become StructType columns, but instead get converted to MapType for some reason.

f"{field.name} {convert_sp_to_sf_type(field.datatype)}"
f"{field.raw_name} {convert_sp_to_sf_type(field.datatype)}"
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda have the same problem as Afroz's on BCR.. previous we have name upper cased but now we do not, this won't break users when user references keys in the map object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know nobody is using StructType columns yet due to them requiring structured types be enabled for their account.

@@ -106,7 +106,7 @@ def _create_test_dataframe(s):
StructType(
[
StructField("A", StringType(16777216), nullable=True),
StructField("B", DoubleType(), nullable=True),
StructField('"b"', DoubleType(), nullable=True),
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need extra double quote here?

@@ -407,6 +407,7 @@ class ColumnIdentifier:
"""Represents a column identifier."""

def __init__(self, normalized_name: str) -> None:
self.raw_name = normalized_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is internal usage only then I prefer marking them as private _raw_name

@@ -487,6 +488,10 @@ def name(self) -> str:
"""Returns the column name."""
return self.column_identifier.name

@property
def raw_name(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the the public vs private api comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants