Introduce simple date time formatter #10966

NEUpanning · 2024-09-11T03:54:25Z

Introduce new DateTimeFormatterType called 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' that are used when Spark legacy time parser policy is enabled for java.text.SimpleDateFormat in lenient and non-lenient mode. The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.
Spark functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'.
Spark functions using lenient mode: cast timestamp to string.
'casting timestamp to string' will use LENIENT_SIMPLE only after the behavior of LENIENT_SIMPLE is aligned with Spark since it does not use Joda DateFormatter to do cast.

Relates #10354

netlify · 2024-09-11T03:54:43Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`ca71412`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/66f16dc4dea58200087340ed

NEUpanning · 2024-09-11T03:55:57Z

@rui-mo Could you help to review this PR please? Thanks.

rui-mo

Thanks!

velox/functions/lib/DateTimeFormatter.cpp

velox/functions/sparksql/flags.cpp

velox/functions/lib/DateTimeFormatter.cpp

rui-mo · 2024-09-11T07:55:13Z

velox/functions/lib/DateTimeFormatter.cpp

+    }
+  }
+  DateTimeFormatterType type = lenient ? DateTimeFormatterType::LENIENT_SIMPLE
+                                       : DateTimeFormatterType::STRICT_SIMPLE;


Wondering where the LENIENT_SIMPLE and STRICT_SIMPLE will be used. I only find their definitions in this PR but no usage.

It will be used when lenient and non-lenient modes have different code branches. For example, if DateTimeFormatterType is LENIENT_SIMPLE, the helper function "daysSinceEpochFromWeekOfMonthDate" will be called with "lenient=true" otherwise with "lenient=false"

NEUpanning · 2024-09-11T11:44:32Z

@rui-mo I've updated code according to review comments. Could you take a look? Thanks.

rui-mo

Thanks.

velox/core/QueryConfig.h

velox/docs/configs.rst

velox/functions/lib/DateTimeFormatter.cpp

velox/functions/lib/DateTimeFormatter.h

rui-mo · 2024-09-12T13:36:33Z

velox/functions/sparksql/DateTimeFunctions.h

@@ -156,7 +168,8 @@ struct UnixTimestampParseFunction {
      const std::vector<TypePtr>& /*inputTypes*/,
      const core::QueryConfig& config,
      const arg_type<Varchar>* /*input*/) {
-    format_ = buildJodaDateTimeFormatter(kDefaultFormat_);
+    format_ = getDateTimeFormatter(
+        config.sparkLegacyTimeParser(), kDefaultFormat_, false);


How does Spark decide lenient or not? Is it through another configuration?

There is no configuration to decide lenient or not. It's just uses lenient mode or strict mode . See this Spark issue

SimpleDateFormat - is used in JDBC datasource, in partitions parsing.
SimpleDateFormat in strong mode (lenient = false). It is used by the date_format, from_unixtime, unix_timestamp and to_unix_timestamp functions.

FYI:
Functions using strict mode(lenient=false):
'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'

Functions using lenient mode:
cast date to string

FYI: Functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'

Functions using lenient mode: cast date to string

Thanks for the clarify. This is much clearer to me. Would you add this comment to the PR description?

Sure. Updated.

rui-mo

Thanks. Just several nits.

velox/core/QueryConfig.h

velox/docs/configs.rst

velox/functions/lib/DateTimeFormatter.h

velox/functions/sparksql/DateTimeFunctions.h

NEUpanning · 2024-09-19T11:48:48Z

@mbasmanova Could you help to review this PR please? Thanks.

mbasmanova

The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark.

Does it mean that after this PR, some queries would return incorrect results? What happens if "further PR" doesn't materialize? Will we be left in a broken state?

velox/docs/configs.rst

velox/functions/sparksql/DateTimeFunctions.h

NEUpanning · 2024-09-20T03:31:06Z

Does it mean that after this PR, some queries would return incorrect results? What happens if "further PR" doesn't materialize? Will we be left in a broken state?

@mbasmanova Currently, Spark is using the Joda date formatter for date parsing/formatting, which does not align with Spark's legacy date format behavior. This issue highlights the main differences. Therefore, some queries will return incorrect results. After this PR, these incorrect behaviors will still exist and the "further PRs" will address and correct these behaviors. If the "further PRs" do not materialize, the state will remain the same as it is now.

mbasmanova

@NEUpanning Some follow-up comments.

velox/functions/sparksql/DateTimeFunctions.h

velox/docs/configs.rst

velox/functions/lib/DateTimeFormatter.h

NEUpanning · 2024-09-20T10:11:55Z

@mbasmanova Could you take a look again please? Thanks.

mbasmanova

@NEUpanning Thank you for iterating. Hopefully, one last question.

Can you remind me again where LENIENT_SIMPLE will be used. Somehow, I don't see any usages in that PR.

I'd expect that this PR would introduce 2 modes: LENIENT_SIMPLE and STRICT_SIMPLE and start specifying these correctly, but implementation of LENIENT_SIMPLE would still be equal to STRICT_SIMPLE and follow-up PR would change that.

Hence, I'd expect to see LENIENT_SIMPLE being requested in some places and STRICT_SIMPLE in others. I see call sites for STRICT_SIMPLE, but not for LENIENT_SIMPLE.

velox/docs/functions/spark/datetime.rst

velox/functions/lib/DateTimeFormatter.h

velox/functions/sparksql/DateTimeFunctions.h

NEUpanning · 2024-09-20T12:02:59Z

@mbasmanova LENIENT_SIMPLE will be used in 'casting date(Timestamp) to string'. However, the current implementation does not use Joda DateFormatter to do cast, so it cannot be changed to use LENIENT_SIMPLE without fully implementing LENIENT_SIMPLE. Otherwise, the behavior of 'casting date(Timestamp) to string' would be different from its current behavior. Therefore, I am in favor of changing 'casting date(Timestamp) to string' to use LENIENT_SIMPLE only after its behavior is aligned with Spark.

mbasmanova · 2024-09-20T15:24:51Z

@mbasmanova LENIENT_SIMPLE will be used in 'casting date(Timestamp) to string'. However, the current implementation does not use Joda DateFormatter to do cast, so it cannot be changed to use LENIENT_SIMPLE without fully implementing LENIENT_SIMPLE. Otherwise, the behavior of 'casting date(Timestamp) to string' would be different from its current behavior. Therefore, I am in favor of changing 'casting date(Timestamp) to string' to use LENIENT_SIMPLE only after its behavior is aligned with Spark.

Got it. Would you update PR description to add this context?

mbasmanova

CI is red.

velox/functions/sparksql/DateTimeFunctions.h

NEUpanning · 2024-09-21T15:44:02Z

Would you update PR description to add this context?

@mbasmanova Sure. I added this context and the expected call sites of new date formatter types.

facebook-github-bot · 2024-09-23T15:09:43Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-24T17:11:10Z

@Yuhta merged this pull request in 35b79eb.

conbench-facebook · 2024-09-24T17:58:54Z

Conbench analyzed the 1 benchmark run on commit 35b79eb5.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

introduce simple date format

a7ed64c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024

reformat

212bf50

rui-mo reviewed Sep 11, 2024

View reviewed changes

NEUpanning added 2 commits September 11, 2024 17:53

update

41d77f2

update

298d52c

rui-mo reviewed Sep 12, 2024

View reviewed changes

update

ab30f2f

NEUpanning requested a review from rui-mo September 13, 2024 05:21

rui-mo reviewed Sep 18, 2024

View reviewed changes

NEUpanning added 2 commits September 18, 2024 17:57

add doc

4923d21

update

899c835

NEUpanning requested a review from rui-mo September 18, 2024 10:33

mbasmanova requested a review from pedroerp September 19, 2024 14:31

mbasmanova reviewed Sep 19, 2024

View reviewed changes

velox/docs/configs.rst Outdated Show resolved Hide resolved

velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved

updated

d5ba5c9

NEUpanning requested a review from mbasmanova September 20, 2024 06:20

mbasmanova reviewed Sep 20, 2024

View reviewed changes

NEUpanning added 2 commits September 20, 2024 17:57

updated

00cb140

updated

b67a6da

mbasmanova reviewed Sep 20, 2024

View reviewed changes

velox/docs/functions/spark/datetime.rst Show resolved Hide resolved

velox/functions/lib/DateTimeFormatter.h Show resolved Hide resolved

mbasmanova reviewed Sep 20, 2024

View reviewed changes

velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved

updated

1b5b2c8

reformat

e370169

mbasmanova approved these changes Sep 20, 2024

View reviewed changes

velox/functions/sparksql/DateTimeFunctions.h Outdated Show resolved Hide resolved

mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Sep 20, 2024

fix ci

ca71412

NEUpanning force-pushed the introduce_simple_date_format branch from cd07fca to ca71412 Compare September 23, 2024 13:31

facebook-github-bot closed this in 35b79eb Sep 24, 2024

facebook-github-bot added the Merged label Sep 24, 2024

NEUpanning deleted the introduce_simple_date_format branch September 25, 2024 02:07

This was referenced Sep 25, 2024

Velox doesn't support legacy date format behavior in Spark SQL #10354

Open

Support WEEK_YEAR for date time formatter #10930

Open

[VL] Enable Spark legacy date formatter if spark.sql.legacy.timeParserPolicy is set to 'LEGACY' apache/incubator-gluten#7375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce simple date time formatter #10966

Introduce simple date time formatter #10966

NEUpanning commented Sep 11, 2024 •

edited

Loading

netlify bot commented Sep 11, 2024 •

edited

Loading

NEUpanning commented Sep 11, 2024

rui-mo left a comment

rui-mo Sep 11, 2024

NEUpanning Sep 11, 2024

NEUpanning commented Sep 11, 2024

rui-mo left a comment

rui-mo Sep 12, 2024

NEUpanning Sep 13, 2024

NEUpanning Sep 13, 2024 •

edited

Loading

rui-mo Sep 18, 2024

NEUpanning Sep 18, 2024

rui-mo left a comment

NEUpanning commented Sep 19, 2024

mbasmanova left a comment

NEUpanning commented Sep 20, 2024

mbasmanova left a comment

NEUpanning commented Sep 20, 2024

mbasmanova left a comment

NEUpanning commented Sep 20, 2024

mbasmanova commented Sep 20, 2024

mbasmanova left a comment

NEUpanning commented Sep 21, 2024 •

edited

Loading

facebook-github-bot commented Sep 23, 2024

facebook-github-bot commented Sep 24, 2024

conbench-facebook bot commented Sep 24, 2024

Introduce simple date time formatter #10966

Introduce simple date time formatter #10966

Conversation

NEUpanning commented Sep 11, 2024 • edited Loading

netlify bot commented Sep 11, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

NEUpanning commented Sep 11, 2024

rui-mo left a comment

Choose a reason for hiding this comment

rui-mo Sep 11, 2024

Choose a reason for hiding this comment

NEUpanning Sep 11, 2024

Choose a reason for hiding this comment

NEUpanning commented Sep 11, 2024

rui-mo left a comment

Choose a reason for hiding this comment

rui-mo Sep 12, 2024

Choose a reason for hiding this comment

NEUpanning Sep 13, 2024

Choose a reason for hiding this comment

NEUpanning Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

rui-mo Sep 18, 2024

Choose a reason for hiding this comment

NEUpanning Sep 18, 2024

Choose a reason for hiding this comment

rui-mo left a comment

Choose a reason for hiding this comment

NEUpanning commented Sep 19, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

NEUpanning commented Sep 20, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

NEUpanning commented Sep 20, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

NEUpanning commented Sep 20, 2024

mbasmanova commented Sep 20, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

NEUpanning commented Sep 21, 2024 • edited Loading

facebook-github-bot commented Sep 23, 2024

facebook-github-bot commented Sep 24, 2024

conbench-facebook bot commented Sep 24, 2024

NEUpanning commented Sep 11, 2024 •

edited

Loading

netlify bot commented Sep 11, 2024 •

edited

Loading

NEUpanning Sep 13, 2024 •

edited

Loading

NEUpanning commented Sep 21, 2024 •

edited

Loading