-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Spark from_json function #11709
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
006efc5
to
89d888e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added some initial comments.
d1c7d69
to
d74a262
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json input is variable, how can we make sure all the implement matches to Spark, Maybe we need to search from_json
in Spark and make sure the result is correct.
The current implementation supports only Spark's default behavior, and we should fall back to Spark's implementation when specific unsupported cases arise. These include situations where user-provided options are non-empty, schemas contain unsupported types, schemas include a column with the same name as The only existing unit tests in Spark related to this function are found in |
a284e49
to
2762885
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Added some comments.
68dab93
to
5bdc4c2
Compare
5bdc4c2
to
4ef69ed
Compare
4ef69ed
to
1ef2f81
Compare
Why I Need to Reimplement JSON Parsing Logic Instead of Using CAST(JSON):
Failure Handling:
On failure, from_json(JSON) returns NULL. For instance, parsing {"a 1} would result in {NULL}.
Root Type Restrictions:
Only ROW, ARRAY, and MAP types are allowed as root types.
Boolean Handling:
Only true and false are considered valid boolean values. Numeric values or strings will result in NULL.
Integral Type Handling:
Only integral values are valid for integral types. Floating-point values and strings will produce NULL.
Float/Double Handling:
All numeric values are valid for float/double types. However, for strings, only specific values like "NaN" or "INF" are valid.
Array Handling:
Spark allows a JSON object as input for an array schema only if the array is the root type and its child type is a ROW.
Map Handling:
Keys in a MAP can only be of VARCHAR type. For example, parsing {"3": 3} results in {"3": 3} instead of {3: 3}.
Row Handling:
Spark supports partial output mode. However, it does not allow an input JSON array when parsing a ROW.