-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix time zone handling in index values and metadata #1866
base: master
Are you sure you want to change the base?
Changes from all commits
b117618
b5776b5
e1c3887
8d03e9f
e33b93a
fb8baed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -183,3 +183,15 @@ The handling of `NaN` in ArcticDB depends on the type of the column under consid | |
* For string columns, `NaN`, as well as Python `None`, are fully supported. | ||
* For floating-point numeric columns, `NaN` is also fully supported. | ||
* For integer numeric columns `NaN` is not supported. A column that otherwise contains only integers will be treated as a floating point column if a `NaN` is encountered by ArcticDB, at which point [the usual rules](api/arctic.md#LibraryOptions) around type promotion for libraries configured with or without dynamic schema all apply as usual. | ||
|
||
### How does ArcticDB handle `time zone` information? | ||
|
||
ArcticDB takes a more strict approach to time zone handling than Pandas. While Pandas support multiple types for storing time zone information, ArcticDB coerces all time zone information to `pytz.tzfile` as this is the default representation for Pandas as well. This way we can ensure that all time zone information is stored in a consistent manner, and that all time zone information is preserved when data is written to and read from ArcticDB. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is It seems to have changed a lot in Pandas 2 https://github.com/pandas-dev/pandas/blob/2a10e04a099d5f1633abcdfbb2dd9fdf09142f8d/doc/source/whatsnew/v2.0.0.rst#L555 |
||
|
||
When writing data with time zone information to ArcticDB, we preserve the time name as string (e.g. "Europe/Amsterdam"). When reading data with time zone information from ArcticDB, this data is used to create a `pytz.tzfile` object. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if the writer uses a timezone file that is not in the reader's zone database? How does this work across operating systems? I assume the database files are quite different across Linux and Windows, there are quite complex mappings between them. I think this makes it fundamentally impossible to round-trip timezones correctly - at least without relying on some standard database and validating that what is written is in there 🤔 Ah I see - this is all handled by |
||
|
||
!!! note | ||
|
||
This means that regardles of the timezone types being written in ArcticDB, all time zone types are normalised to `pytz.tzfile`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's important that we round-trip correctly - this could be quite a disruptive change otherwise. At least requiring a new major version - unless I'm misunderstanding? I notice the PR description says that this |
||
If you would like a another time-zone type back then string representation can be used to recreate it, e.g. `str(dt.tzinfo)`: | ||
- For `ZoneInfo`, after Python 3.9+, you can use: `ZoneInfo(str(dt.tzinfo))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't repro the crash in issue #1406 anymore