You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In arrow 34.0.0, values such as '2021-1-1T05:11:10.432' were parsed as a timestamp but now they error
To Reproduce
Try to parse this value as a timestamp: '2021-1-1T05:11:10.432'
Expected behavior
The value '2021-1-1T05:11:10.432' parses the same as '2021-01-01T05:11:10.432'
Workaround is to change it to (add leading zeros) '2021-01-01T05:11:10.432'
Additional context
We discovered this as part of the upgrade in DataFusion apache/datafusion#5685
Per @tustvold it appears that 2021-1-1T05:11:10.432 is supported by chrono even though it is not strictly valid and arrow was not documented to support this format: apache/datafusion#5685 (comment)
It may be that we simply choose not to fix this regression as the previous implementation was working "by accident" but I wanted to file this ticket to track the change in behavior
The text was updated successfully, but these errors were encountered:
It may be that we simply choose not to fix this regression as the previous implementation was working "by accident"
Indeed, chrono's implementation of strptime is incredibly permissive, to the point where some of the undocumented behaviour is actually considered a bug - chronotope/chrono#332. In particular the docs state that %m and similar expect 0 padded inputs, however, the implementation is more permissive.
We strive to support RFC3339 and reasonable variants thereupon, we therefore added additional cases based on chrono's strptime to supplement chrono's very rigid RFC3339 support, which in turn led to chrono's permissive strptime implementation unintentionally leaking through arrow's parsing abstraction.
Given RFC3339 requires that days and months should be two digits and years must be 4 digits, and we have never claimed support for such timestamps, I do not personally consider this a regression and do not plan to change it.
That being said if someone wants to contribute additional functionality for this and is able to do so in a way that doesn't regress parsing performance, I would be fine with it. The major motivation for not supporting it is that knowing ahead of time the character layout is what allows the parser to be more efficient than otherwise.
As per RFC3339 we should not support anything other than 4 digits for years, as this leads to ambiguity.
Describe the bug
In arrow
34.0.0
, values such as'2021-1-1T05:11:10.432'
were parsed as a timestamp but now they errorTo Reproduce
Try to parse this value as a timestamp:
'2021-1-1T05:11:10.432'
Expected behavior
The value
'2021-1-1T05:11:10.432'
parses the same as'2021-01-01T05:11:10.432'
Workaround is to change it to (add leading zeros)
'2021-01-01T05:11:10.432'
Additional context
We discovered this as part of the upgrade in DataFusion apache/datafusion#5685
Per @tustvold it appears that
2021-1-1T05:11:10.432
is supported by chrono even though it is not strictly valid and arrow was not documented to support this format: apache/datafusion#5685 (comment)It may be that we simply choose not to fix this regression as the previous implementation was working "by accident" but I wanted to file this ticket to track the change in behavior
The text was updated successfully, but these errors were encountered: