-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce simple date time formatter #10966
Introduce simple date time formatter #10966
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@rui-mo Could you help to review this PR please? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
} | ||
} | ||
DateTimeFormatterType type = lenient ? DateTimeFormatterType::LENIENT_SIMPLE | ||
: DateTimeFormatterType::STRICT_SIMPLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering where the LENIENT_SIMPLE and STRICT_SIMPLE will be used. I only find their definitions in this PR but no usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used when lenient and non-lenient modes have different code branches. For example, if DateTimeFormatterType is LENIENT_SIMPLE, the helper function "daysSinceEpochFromWeekOfMonthDate" will be called with "lenient=true" otherwise with "lenient=false"
@rui-mo I've updated code according to review comments. Could you take a look? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -156,7 +168,8 @@ struct UnixTimestampParseFunction { | |||
const std::vector<TypePtr>& /*inputTypes*/, | |||
const core::QueryConfig& config, | |||
const arg_type<Varchar>* /*input*/) { | |||
format_ = buildJodaDateTimeFormatter(kDefaultFormat_); | |||
format_ = getDateTimeFormatter( | |||
config.sparkLegacyTimeParser(), kDefaultFormat_, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Spark decide lenient or not? Is it through another configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no configuration to decide lenient or not. It's just uses lenient mode or strict mode . See this Spark issue
SimpleDateFormat - is used in JDBC datasource, in partitions parsing.
SimpleDateFormat in strong mode (lenient = false). It is used by the date_format, from_unixtime, unix_timestamp and to_unix_timestamp functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI:
Functions using strict mode(lenient=false):
'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'
Functions using lenient mode:
cast date to string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'
Functions using lenient mode: cast date to string
Thanks for the clarify. This is much clearer to me. Would you add this comment to the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just several nits.
@mbasmanova Could you help to review this PR please? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.
Does it mean that after this PR, some queries would return incorrect results? What happens if "further PR" doesn't materialize? Will we be left in a broken state?
@mbasmanova Currently, Spark is using the Joda date formatter for date parsing/formatting, which does not align with Spark's legacy date format behavior. This issue highlights the main differences. Therefore, some queries will return incorrect results. After this PR, these incorrect behaviors will still exist and the "further PRs" will address and correct these behaviors. If the "further PRs" do not materialize, the state will remain the same as it is now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NEUpanning Some follow-up comments.
@mbasmanova Could you take a look again please? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NEUpanning Thank you for iterating. Hopefully, one last question.
Can you remind me again where LENIENT_SIMPLE will be used. Somehow, I don't see any usages in that PR.
I'd expect that this PR would introduce 2 modes: LENIENT_SIMPLE and STRICT_SIMPLE and start specifying these correctly, but implementation of LENIENT_SIMPLE would still be equal to STRICT_SIMPLE and follow-up PR would change that.
Hence, I'd expect to see LENIENT_SIMPLE being requested in some places and STRICT_SIMPLE in others. I see call sites for STRICT_SIMPLE, but not for LENIENT_SIMPLE.
@mbasmanova LENIENT_SIMPLE will be used in 'casting date(Timestamp) to string'. However, the current implementation does not use Joda DateFormatter to do cast, so it cannot be changed to use LENIENT_SIMPLE without fully implementing LENIENT_SIMPLE. Otherwise, the behavior of 'casting date(Timestamp) to string' would be different from its current behavior. Therefore, I am in favor of changing 'casting date(Timestamp) to string' to use LENIENT_SIMPLE only after its behavior is aligned with Spark. |
Got it. Would you update PR description to add this context? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is red.
@mbasmanova Sure. I added this context and the expected call sites of new date formatter types. |
cd07fca
to
ca71412
Compare
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Introduce new DateTimeFormatterType called 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' that are used when Spark legacy time parser policy is enabled for java.text.SimpleDateFormat in lenient and non-lenient mode. The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.
Spark functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'.
Spark functions using lenient mode: cast timestamp to string.
'casting timestamp to string' will use LENIENT_SIMPLE only after the behavior of LENIENT_SIMPLE is aligned with Spark since it does not use Joda DateFormatter to do cast.
Relates #10354