-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cast(JSON as ROW) #9449
Conversation
This pull request was exported from Phabricator. Differential Revision: D56014538 |
✅ Deploy Preview for meta-velox canceled.
|
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. Differential Revision: D56014538
76f1202
to
0c8f7a3
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
// boost::algorithm::to_lower is very slow. Use much faster | ||
// folly::toLowerAscii if possible. | ||
if (allFieldsAreAscii) { | ||
folly::toLowerAscii(lowerCaseKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is toLowerAscii
safe to be called on non-ascii input? If not we may need to check if lowerCaseKey
is ascii as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is: "Leaves all other characters unchanged, including those with the 0x80 bit set."
/**
* Convert ascii to lowercase, in-place.
*
* Leaves all other characters unchanged, including those with the 0x80
* bit set.
* @param str String to convert
* @param length Length of str, in bytes
*/
void toLowerAscii(char* str, size_t length);
inline void toLowerAscii(std::string& str) {
// str[0] is legal also if the string is empty.
toLowerAscii(&str[0], str.size());
}
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Differential Revision: D56014538
0c8f7a3
to
65e0e6e
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Differential Revision: D56014538
65e0e6e
to
8aaf584
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova nice optimization. Thanks!
@@ -38,6 +38,17 @@ namespace facebook::velox { | |||
|
|||
namespace { | |||
|
|||
bool isAscii(const std::string& str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we consider to put this into some common utility like common/base/String.h or StrUtil.h. This seems to be general function. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me switch to existing functions::stringCore::isAscii from velox/functions/lib/string/StringCore.h
bool allFieldsAreAscii = true; | ||
const auto size = rowType.size(); | ||
for (auto i = 0; i < size; ++i) { | ||
allFieldsAreAscii &= isAscii(rowType.nameOf(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we break out of the loop on the first non-ascii field name?
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538
8aaf584
to
4608fb4
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538
4608fb4
to
f0fe19c
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538
f0fe19c
to
1a29eab
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
Summary: CAST(JSON as ROW(ARRAY()) used to fail with ``` OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered. ``` According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding. Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates. Also, fix the test to actually verify JSON objects with mixed case keys. Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56013293
Summary: boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538
1a29eab
to
9124213
Compare
This pull request was exported from Phabricator. Differential Revision: D56014538 |
This pull request has been merged in efb0213. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Pull Request resolved: facebookincubator#9449 boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538 fbshipit-source-id: cedfb5f58b59f29ce02344d12fdd79b2fe8fbb21
Summary: Pull Request resolved: facebookincubator#9449 boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii. With this optimization a query that's heavy on this cast runs 3x faster. {F1484171823} Reviewed By: xiaoxmeng, Yuhta Differential Revision: D56014538 fbshipit-source-id: cedfb5f58b59f29ce02344d12fdd79b2fe8fbb21
Summary:
boost::algorithm::to_lower is very high on the profile. In most cases field names are all-ASCII, hence, we can use more efficient folly::toLowerAscii.
With this optimization a query that's heavy on this cast runs 3x faster.
Differential Revision: D56014538