-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1748140: Modify schema_expression to be structured type aware. #2659
SNOW-1748140: Modify schema_expression to be structured type aware. #2659
Conversation
if data_type.structured: | ||
key = schema_expression(data_type.key_type, is_nullable) | ||
value = schema_expression(data_type.value_type, is_nullable) | ||
return f"object_construct_keep_null({key}, {value}) :: {convert_sp_to_sf_type(data_type)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we determine whether keeping null values based on is_nullable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't keep nulls and either the key or value gets evaluated to a NULL ::
type statement then that field would be dropped from the schema altogether. For this reason I think we always want nulls.
assert table.union(table).schema == expected_schema | ||
# Functions used in schema generation don't respect nested nullability so compare query string instead | ||
non_null_union = non_null_table.union(non_null_table) | ||
assert non_null_union._plan.schema_query == ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also create a test case for nested array and object? like to_array(... to_array(...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1748140
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This pull request includes several changes to improve the handling of structured data types in the
schema_expression
function to support structured types.Schema expression has different case for when a field is nullable and when it is not. This change modifies the nullable case to skip the semi-structured branches of the if statement and instead use the default case so that
convert_sp_to_sf_type
does the work to specify the schema instead.For non-nullable fields it relies on recursive calls to schema_expression in order to get good default values for the various nestsed data types and then casts a relevant data structured to the correct schema. Note that non-nullable columns do not have nullability respected in their child fields which is a limitation of the way we infer schemas using the dummy query approach.
While writing some test cases for this I also found that the nullability of child fields was not set correctly when parsing metadata so I added that in type_utils as well.
I've opened two new bugs as a result of this change:
SNOW-1819531 - Large query breakdown does not appear to work correctly with the structured types.
SNOW-1819428 - When calling create_dataframe with a schema that contains a StructType column, the child fields do not have their nullability respected. I suspect this due to a similar limitation in how we generate schema strings.