Remove cudf._lib.json in favor of inlining pylibcudf #17443

mroeschke · 2024-11-26T00:20:12Z

Description

Contributes to #17317

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Matt711

Just a couple of questions

python/cudf/cudf/io/json.py

wence- · 2024-11-26T11:09:37Z

python/cudf/cudf/io/json.py

+def _update_col_struct_field_names(
+    col: ColumnBase, child_names: dict
+) -> ColumnBase:
+    if col.children:
+        children = list(col.children)
+        for i, (child, names) in enumerate(
+            zip(children, child_names.values())
+        ):
+            children[i] = _update_col_struct_field_names(child, names)
+        col.set_base_children(tuple(children))
+
+    if isinstance(col.dtype, cudf.StructDtype):
+        col = col._rename_fields(child_names.keys())  # type: ignore[attr-defined]
+
+    return col


I guess it is OK that we update names in-place, but it is a minor potential footgun for two columns that share data (but in theory could have different names).

Matt711

One comment and one suggestion. Both are non-blocking

Matt711 · 2024-11-28T00:04:48Z

python/cudf/cudf/io/json.py

    compression="infer",
-    byte_range=None,
-    keep_quotes=False,
+    byte_range: None | list[int] = None,


Very nit-picky. Do you have a preference?

Suggested change

byte_range: None | list[int] = None,

byte_range: list[int] | None = None,

I don't have a preference in particular. I agree that your suggestion looks better

Matt711 · 2024-11-28T00:05:19Z

python/cudf/cudf/utils/ioutils.py

+) -> None:
+    for name, child_names in child_names_dict.items():
+        col = df._data[name]
+        df._data[name] = _update_col_struct_field_names(col, child_names)


Just thinking out loud based on @wence- review. We probably don't want to eagerly copy, so maybe we can optionally copy if the column and children share data?

mroeschke · 2024-11-28T01:50:30Z

/merge

Remove cudf._lib.json in favor of inlining pylibcudf

ff78b1e

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 26, 2024

mroeschke self-assigned this Nov 26, 2024

mroeschke requested a review from a team as a code owner November 26, 2024 00:20

mroeschke requested review from wence- and brandon-b-miller November 26, 2024 00:20

github-actions bot added the CMake CMake build issue label Nov 26, 2024

Matt711 reviewed Nov 26, 2024

View reviewed changes

python/cudf/cudf/io/json.py Outdated Show resolved Hide resolved

python/cudf/cudf/io/json.py Show resolved Hide resolved

wence- reviewed Nov 26, 2024

View reviewed changes

mroeschke added 3 commits November 26, 2024 12:16

Merge remote-tracking branch 'upstream/branch-25.02' into cudf/_lib/json

b71b021

Move helper function to ioutils

4cd0772

Merge branch 'branch-25.02' into cudf/_lib/json

2d6c2ee

Matt711 approved these changes Nov 28, 2024

View reviewed changes

rapids-bot bot merged commit 9b88794 into rapidsai:branch-25.02 Nov 28, 2024
105 checks passed

mroeschke deleted the cudf/_lib/json branch November 28, 2024 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cudf._lib.json in favor of inlining pylibcudf #17443

Remove cudf._lib.json in favor of inlining pylibcudf #17443

mroeschke commented Nov 26, 2024

Matt711 left a comment

wence- Nov 26, 2024

Matt711 left a comment

Matt711 Nov 28, 2024

mroeschke Nov 28, 2024

Matt711 Nov 28, 2024

mroeschke commented Nov 28, 2024

	byte_range: None \| list[int] = None,
	byte_range: list[int] \| None = None,

Remove cudf._lib.json in favor of inlining pylibcudf #17443

Remove cudf._lib.json in favor of inlining pylibcudf #17443

Conversation

mroeschke commented Nov 26, 2024

Description

Checklist

Matt711 left a comment

Choose a reason for hiding this comment

wence- Nov 26, 2024

Choose a reason for hiding this comment

Matt711 left a comment

Choose a reason for hiding this comment

Matt711 Nov 28, 2024

Choose a reason for hiding this comment

mroeschke Nov 28, 2024

Choose a reason for hiding this comment

Matt711 Nov 28, 2024

Choose a reason for hiding this comment

mroeschke commented Nov 28, 2024