Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce simple date time formatter #10966

Conversation

NEUpanning
Copy link
Contributor

@NEUpanning NEUpanning commented Sep 11, 2024

Introduce new DateTimeFormatterType called 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' that are used when Spark legacy time parser policy is enabled for java.text.SimpleDateFormat in lenient and non-lenient mode. The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.
Spark functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'.
Spark functions using lenient mode: cast timestamp to string.
'casting timestamp to string' will use LENIENT_SIMPLE only after the behavior of LENIENT_SIMPLE is aligned with Spark since it does not use Joda DateFormatter to do cast.

Relates #10354

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024
Copy link

netlify bot commented Sep 11, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ca71412
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66f16dc4dea58200087340ed

@NEUpanning
Copy link
Contributor Author

@rui-mo Could you help to review this PR please? Thanks.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/flags.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
}
}
DateTimeFormatterType type = lenient ? DateTimeFormatterType::LENIENT_SIMPLE
: DateTimeFormatterType::STRICT_SIMPLE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering where the LENIENT_SIMPLE and STRICT_SIMPLE will be used. I only find their definitions in this PR but no usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used when lenient and non-lenient modes have different code branches. For example, if DateTimeFormatterType is LENIENT_SIMPLE, the helper function "daysSinceEpochFromWeekOfMonthDate" will be called with "lenient=true" otherwise with "lenient=false"

@NEUpanning
Copy link
Contributor Author

@rui-mo I've updated code according to review comments. Could you take a look? Thanks.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/core/QueryConfig.h Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.cpp Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.h Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.h Show resolved Hide resolved
@@ -156,7 +168,8 @@ struct UnixTimestampParseFunction {
const std::vector<TypePtr>& /*inputTypes*/,
const core::QueryConfig& config,
const arg_type<Varchar>* /*input*/) {
format_ = buildJodaDateTimeFormatter(kDefaultFormat_);
format_ = getDateTimeFormatter(
config.sparkLegacyTimeParser(), kDefaultFormat_, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Spark decide lenient or not? Is it through another configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no configuration to decide lenient or not. It's just uses lenient mode or strict mode . See this Spark issue

SimpleDateFormat - is used in JDBC datasource, in partitions parsing.
SimpleDateFormat in strong mode (lenient = false). It is used by the date_format, from_unixtime, unix_timestamp and to_unix_timestamp functions.

Copy link
Contributor Author

@NEUpanning NEUpanning Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI:
Functions using strict mode(lenient=false):
'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'

Functions using lenient mode:
cast date to string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'

Functions using lenient mode: cast date to string

Thanks for the clarify. This is much clearer to me. Would you add this comment to the PR description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated.

@NEUpanning NEUpanning requested a review from rui-mo September 13, 2024 05:21
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just several nits.

velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
@NEUpanning NEUpanning requested a review from rui-mo September 18, 2024 10:33
@NEUpanning
Copy link
Contributor Author

@mbasmanova Could you help to review this PR please? Thanks.

@mbasmanova mbasmanova requested a review from pedroerp September 19, 2024 14:31
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.

Does it mean that after this PR, some queries would return incorrect results? What happens if "further PR" doesn't materialize? Will we be left in a broken state?

velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
@NEUpanning
Copy link
Contributor Author

Does it mean that after this PR, some queries would return incorrect results? What happens if "further PR" doesn't materialize? Will we be left in a broken state?

@mbasmanova Currently, Spark is using the Joda date formatter for date parsing/formatting, which does not align with Spark's legacy date format behavior. This issue highlights the main differences. Therefore, some queries will return incorrect results. After this PR, these incorrect behaviors will still exist and the "further PRs" will address and correct these behaviors. If the "further PRs" do not materialize, the state will remain the same as it is now.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NEUpanning Some follow-up comments.

velox/functions/sparksql/DateTimeFunctions.h Show resolved Hide resolved
velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/docs/configs.rst Outdated Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.h Show resolved Hide resolved
@NEUpanning
Copy link
Contributor Author

@mbasmanova Could you take a look again please? Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NEUpanning Thank you for iterating. Hopefully, one last question.

Can you remind me again where LENIENT_SIMPLE will be used. Somehow, I don't see any usages in that PR.

I'd expect that this PR would introduce 2 modes: LENIENT_SIMPLE and STRICT_SIMPLE and start specifying these correctly, but implementation of LENIENT_SIMPLE would still be equal to STRICT_SIMPLE and follow-up PR would change that.

Hence, I'd expect to see LENIENT_SIMPLE being requested in some places and STRICT_SIMPLE in others. I see call sites for STRICT_SIMPLE, but not for LENIENT_SIMPLE.

velox/docs/functions/spark/datetime.rst Show resolved Hide resolved
velox/functions/lib/DateTimeFormatter.h Show resolved Hide resolved
@NEUpanning
Copy link
Contributor Author

@mbasmanova LENIENT_SIMPLE will be used in 'casting date(Timestamp) to string'. However, the current implementation does not use Joda DateFormatter to do cast, so it cannot be changed to use LENIENT_SIMPLE without fully implementing LENIENT_SIMPLE. Otherwise, the behavior of 'casting date(Timestamp) to string' would be different from its current behavior. Therefore, I am in favor of changing 'casting date(Timestamp) to string' to use LENIENT_SIMPLE only after its behavior is aligned with Spark.

@mbasmanova
Copy link
Contributor

@mbasmanova LENIENT_SIMPLE will be used in 'casting date(Timestamp) to string'. However, the current implementation does not use Joda DateFormatter to do cast, so it cannot be changed to use LENIENT_SIMPLE without fully implementing LENIENT_SIMPLE. Otherwise, the behavior of 'casting date(Timestamp) to string' would be different from its current behavior. Therefore, I am in favor of changing 'casting date(Timestamp) to string' to use LENIENT_SIMPLE only after its behavior is aligned with Spark.

Got it. Would you update PR description to add this context?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is red.

velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved
@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Sep 20, 2024
@NEUpanning
Copy link
Contributor Author

NEUpanning commented Sep 21, 2024

Would you update PR description to add this context?

@mbasmanova Sure. I added this context and the expected call sites of new date formatter types.

@NEUpanning NEUpanning force-pushed the introduce_simple_date_format branch from cd07fca to ca71412 Compare September 23, 2024 13:31
@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in 35b79eb.

Copy link

Conbench analyzed the 1 benchmark run on commit 35b79eb5.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants