-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for ZZZ in parse_datetime (#11312)
Summary: This diff adds support for JODA's ZZZ pattern in Presto's parse_datetime function. This is used to parse time zone IDs (called "time zone names" in the tz library, but this means something else in JODA). I borrowed the algorithm from JODA to ensure it matches Presto Java's behavior. The idea is to greedily consume the longest substring that matches a known time zone. I borrowed their algorithm which is to break the set of known time zones into a list of those without a prefix (without the '/' character) and lists of suffixes for those with prefixes. This limits the number of strings that need to be compared. I modified it slightly to pre-sort these lists by size descending, so we don't have to necessarily compare every string, but can stop early if we find a match. One other change is I added a get_time_zone_names function to our copy of the tz library. I tried calling get_tzdb() from DateTimeFormatter directly and accessing its zones member to get the names, but for some reason after get_tzdb() returns every time_zone in zones (except the first one) has a string name_ that has nullptr for its data after get_tzdb() returns. I spent a good amount of time trying to figure out why, but couldn't figure it out, so I gave up and added this helper method (for whatever reason everything is fine as long as it's done in the tz.cpp file). If anyone has pointers as to what's going on I'm happy to investigate further, I'd much rather use the existing get_tzdb function if I can. Reviewed By: bikramSingh91 Differential Revision: D64708598
- Loading branch information
1 parent
dc6a7a0
commit 62b4516
Showing
5 changed files
with
217 additions
and
2 deletions.
There are no files selected for viewing
30 changes: 30 additions & 0 deletions
30
velox/external/date/patches/0006-add_get_time_zone_names.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
diff --git a/velox/external/date/tz.cpp b/velox/external/date/tz.cpp | ||
--- a/velox/external/date/tz.cpp | ||
+++ b/velox/external/date/tz.cpp | ||
@@ -3538,6 +3538,14 @@ | ||
return get_tzdb_list().front(); | ||
} | ||
|
||
+std::vector<std::string> get_time_zone_names() { | ||
+ std::vector<std::string> result; | ||
+ for (const auto& z : get_tzdb().zones) { | ||
+ result.push_back(z.name()); | ||
+ } | ||
+ return result; | ||
+} | ||
+ | ||
const time_zone* | ||
#if HAS_STRING_VIEW | ||
tzdb::locate_zone(std::string_view tz_name) const | ||
diff --git a/velox/external/date/tz.h b/velox/external/date/tz.h | ||
--- a/velox/external/date/tz.h | ||
+++ b/velox/external/date/tz.h | ||
@@ -1258,6 +1258,8 @@ | ||
|
||
DATE_API const tzdb& get_tzdb(); | ||
|
||
+std::vector<std::string> get_time_zone_names(); | ||
+ | ||
class tzdb_list | ||
{ | ||
std::atomic<tzdb*> head_{nullptr}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters