-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow partial date parsing when simple datetime formatter is used #11386
Changes from 9 commits
b4c44da
cbd6364
154063e
2f674ab
4b2e9aa
1944caf
83d3154
096b068
f981c2d
1121126
945864e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,12 +82,9 @@ These functions support TIMESTAMP and DATE input types. | |
|
||
Adjusts ``unixTime`` (elapsed seconds since UNIX epoch) to configured session timezone, then | ||
converts it to a formatted time string according to ``format``. Only supports BIGINT type for | ||
``unixTime``. Using `Simple <https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html>`_ | ||
date formatter in lenient mode that is align with Spark legacy date parser behavior or | ||
`Joda <https://www.joda.org/joda-time/>`_ date formatter depends on ``spark.legacy_date_formatter`` configuration. | ||
``unixTime``. | ||
`Valid patterns for date format | ||
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_. When `Simple` date formatter is used, | ||
null is returned for invalid ``format``; otherwise, exception is thrown. This function will convert input to | ||
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_. This function will convert input to | ||
milliseconds, and integer overflow is allowed in the conversion, which aligns with Spark. See the below third | ||
example where INT64_MAX is used, -1000 milliseconds are produced by INT64_MAX * 1000 due to integer overflow. :: | ||
|
||
|
@@ -112,17 +109,11 @@ These functions support TIMESTAMP and DATE input types. | |
Returns timestamp by parsing ``string`` according to the specified ``dateFormat``. | ||
The format follows Spark's | ||
`Datetime patterns | ||
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_. | ||
Using `Simple <https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html>`_ | ||
date formatter in lenient mode that is align with Spark legacy date parser behavior or | ||
`Joda <https://www.joda.org/joda-time/>`_ date formatter depends on ``spark.legacy_date_formatter`` configuration. | ||
Returns NULL for parsing error or NULL input. When `Simple` date formatter is used, null is returned for invalid | ||
``dateFormat``; otherwise, exception is thrown. :: | ||
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_. :: | ||
|
||
SELECT get_timestamp('1970-01-01', 'yyyy-MM-dd); -- timestamp `1970-01-01` | ||
SELECT get_timestamp('1970-01-01', 'yyyy-MM'); -- NULL (parsing error) | ||
SELECT get_timestamp('1970-01-01', null); -- NULL | ||
SELECT get_timestamp('2020-06-10', 'A'); -- (throws exception) | ||
|
||
.. spark:function:: hour(timestamp) -> integer | ||
|
||
|
@@ -291,10 +282,7 @@ These functions support TIMESTAMP and DATE input types. | |
|
||
.. spark:function:: unix_timestamp() -> integer | ||
|
||
Returns the current UNIX timestamp in seconds. Using | ||
`Simple <https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html>`_ date formatter in lenient mode | ||
that is align with Spark legacy date parser behavior or `Joda <https://www.joda.org/joda-time/>`_ date formatter | ||
depends on the ``spark.legacy_date_formatter`` configuration. | ||
Returns the current UNIX timestamp in seconds. | ||
|
||
.. spark:function:: unix_timestamp(string) -> integer | ||
:noindex: | ||
|
@@ -337,3 +325,33 @@ These functions support TIMESTAMP and DATE input types. | |
part of the 53rd week of year 2004, so the result is 2004. Only supports DATE type. | ||
|
||
SELECT year_of_week('2005-01-02'); -- 2004 | ||
|
||
Different Behaviors Between Simple And Joda Date Formmaters | ||
-------------------------------- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think --- needs to go all the way to cover the length of the title. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Fixed |
||
|
||
To align with Spark, Velox supports both `Simple <https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html>`_ | ||
and `Joda <https://www.joda.org/joda-time/>`_ date formmater to parse/format timestamp/date strings | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. formmater -> formmaters |
||
on the functions `from_unixtime`, `unix_timestamp`, `make_date` and `to_unix_timestamp`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on the functions -> used in functions for functions, use references
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Fixed |
||
If the configuration setting :doc:`spark.legacy_date_formatter <../../configs>` is true, | ||
`Simple` date formmater in lenient mode is used; otherwise, `Joda` is used. It is important | ||
to note that there are some different behaviors between these two formatters. | ||
|
||
For `unix_timestamp` and `get_timestamp`, the `Simple` date formatter permits partial date parsing | ||
which means that format can match only a part of input string. For example, if input string is | ||
2015-07-22 10:00:00, it can be parsed using format is yyyy-MM-dd because the parser does not require entire | ||
input to be consumed. In contrast, the `Joda` date formatter performs strict checks to ensure that the | ||
format completely matches the string. If there is any mismatch, exception is thrown. :: | ||
|
||
SELECT get_timestamp('2015-07-22 10:00:00', 'yyyy-MM-dd'); -- timestamp `2015-07-22` (for Simple date formatter) | ||
SELECT get_timestamp('2015-07-22 10:00:00', 'yyyy-MM-dd'); -- (throws exception) (for Joda date formatter) | ||
SELECT unix_timestamp('2016-04-08 00:00:00', 'yyyy-MM-dd'); -- 1460041200 (for Simple date formatter) | ||
SELECT unix_timestamp('2016-04-08 00:00:00', 'yyyy-MM-dd'); -- (throws exception) (for Joda date formatter) | ||
|
||
For `from_unixtime` and `get_timestamp`, when `Simple` date formatter is used, null is returned for invalid | ||
format; otherwise, exception is thrown. :: | ||
|
||
SELECT from_unixtime(100, '!@#$%^&*'); -- NULL (parsing error) (for Simple date formatter) | ||
SELECT from_unixtime(100, '!@#$%^&*'); -- throws exception) (for Joda date formatter) | ||
SELECT get_timestamp('1970-01-01', '!@#$%^&*'); -- NULL (parsing error) (for Simple date formatter) | ||
SELECT get_timestamp('1970-01-01', '!@#$%^&*'); -- throws exception) (for Joda date formatter) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Simple vs. Joda Date Formatter