-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cudf._lib.json in favor of inlining pylibcudf #17443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions
python/cudf/cudf/io/json.py
Outdated
def _update_col_struct_field_names( | ||
col: ColumnBase, child_names: dict | ||
) -> ColumnBase: | ||
if col.children: | ||
children = list(col.children) | ||
for i, (child, names) in enumerate( | ||
zip(children, child_names.values()) | ||
): | ||
children[i] = _update_col_struct_field_names(child, names) | ||
col.set_base_children(tuple(children)) | ||
|
||
if isinstance(col.dtype, cudf.StructDtype): | ||
col = col._rename_fields(child_names.keys()) # type: ignore[attr-defined] | ||
|
||
return col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is OK that we update names in-place, but it is a minor potential footgun for two columns that share data (but in theory could have different names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment and one suggestion. Both are non-blocking
compression="infer", | ||
byte_range=None, | ||
keep_quotes=False, | ||
byte_range: None | list[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nit-picky. Do you have a preference?
byte_range: None | list[int] = None, | |
byte_range: list[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference in particular. I agree that your suggestion looks better
) -> None: | ||
for name, child_names in child_names_dict.items(): | ||
col = df._data[name] | ||
df._data[name] = _update_col_struct_field_names(col, child_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud based on @wence- review. We probably don't want to eagerly copy, so maybe we can optionally copy if the column and children share data?
/merge |
Description
Contributes to #17317
Checklist