Skip to content

Commit

Permalink
Fix(json_parse): Fix json_parse to ignore invalid unicode escape sequ…
Browse files Browse the repository at this point in the history
…ences

Summary:
Certain improperly escaped unicode points like \uDE2Dau throw in json_parse. This happens in simdjson since these are invalid unicode escape sequences (i.e without the surrogate pairs). As Presto java ignores this, we add similar support in Velox. 

```
SELECT json_parse(x) from (values '"\uDE2Dau"') t(x);
-- Returns "\uDE2Dau" in java but throws in Velox
```

Differential Revision: D68176965
  • Loading branch information
Krishna Pai authored and facebook-github-bot committed Jan 14, 2025
1 parent 8817013 commit feb8a27
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
2 changes: 1 addition & 1 deletion velox/functions/prestosql/JsonFunctions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ class JsonParseFunction : public exec::VectorFunction {
return value.get_double().error();
case simdjson::ondemand::json_type::string:
addOrMergeViews(views_, trimToken(value.raw_json_token()));
return value.get_string().error();
return value.get_string(true).error();
case simdjson::ondemand::json_type::boolean:
addOrMergeViews(views_, trimToken(value.raw_json_token()));
return value.get_bool().error();
Expand Down
6 changes: 6 additions & 0 deletions velox/functions/prestosql/tests/JsonFunctionsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,12 @@ TEST_F(JsonFunctionsTest, jsonParse) {

// Test bad unicode characters
testJsonParse("\"Hello \xc0\xaf World\"", "\"Hello �� World\"");
// The below tests fail if simdjson.doc.get_string() is called
// without specifying replacement for bad characters in simdjson.
testJsonParse(R"("\uDE2Dau")", R"("\uDE2Dau")");
testJsonParse(
R"([{"response": "[\"fusil a peinture\",\"\ufffduD83E\\uDE2Dau bois\"]"}])",
R"([{"response":"[\"fusil a peinture\",\"�uD83E\\uDE2Dau bois\"]"}])");

VELOX_ASSERT_THROW(
jsonParse(R"({"k1":})"), "The JSON document has an improper structure");
Expand Down

0 comments on commit feb8a27

Please sign in to comment.