-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax nanosecond datetime restriction in CF time decoding #9618
base: main
Are you sure you want to change the base?
Conversation
Nice, mypy 1.12 is out and breaks our typing, 😭. |
Can we pin it in the CI temporarily? |
Yes, 1.11.2 was the last version. |
ca5050d
to
f7396cf
Compare
This is now ready for a first round of review. I think this is already in a quite usable state. But no rush, this should be thoroughly tested. |
Sounds good @kmuehlbauer! I’ll try and take an initial look this weekend. |
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…ocessing, raise now early
…t resolution, fix code and tests to allow this
Sure -- where would be a good home for that? |
Not sure, but https://docs.xarray.dev/en/stable/user-guide/time-series.html could have a dedicated floating point date section. |
I've added a kwarg But instead of adding that kwarg we could slightly overload the This would have the positive effect, that we wouldn't need the additional kwarg and have to distribute it through the backends.
We could guard This methodology would be fully backwards compatible. It advertises the change via DeprecationWarning in normal operation and also if issues appear in the decoding steps. If this is something which makes sense @shoyer, @dcherian, @spencerkclark, I'd add the needed changes to this PR. |
Alternatively, we could make small progress on #4490 and have from xarray.coding import DatetimeCoder
ds = xr.open_mfdataset(..., decode_times=DatetimeCoder(units="ms")) In the long term, it seems nice to have the default use the "natural" units i.e.
|
This took a while to sink in 😉 Yes, that's a neat move. I'll incorporate this suggestion.
As long as we use |
+1 for fewer arguments to Indeed for those practical reasons I do not think it is worth trying to match the on-disk units of integer data any more closely. Second precision already allows for a time span of roughly +/- 290 billion years (many times older than the Earth), which I think is plenty for most applications :). Monthly or yearly units are also somewhat awkward to deal with due to their different (albeit often violated) definition in the CF conventions. |
Same question,the CMIP6 data have 2300 year dataset,but now xarray can just read data before 2262.So if you can fix this problem,it will be helpful to us.Thanks a lot! |
While appealing, I think this is not a good idea. A couple points: NOTE: I got a bit lost in the discussion, yoy all may have already come to these same conclusions, but I thought I 'd capture it here in one big post ;-) "months" and "years" are NOT recommended by CF, as they are not clearly defined timespans. (thought UDUNITS does have a definition for them, which is the average -- e.g. 365.25 days to the year (or thereabouts...) As for days (or even hours):
looks like it's it's expressing Jan 1, 2, 3, 4, .... But what that actually means is:
That is, the zeroth time of each day -- i.e. a specific point in the time continuum. And this maps to what all (that I know of) the datetime objects do too (python if a user does want to have a way to express the "day", they might do:
That is, noon of each day. But then we can't use days as the unit with the fixed epoch of numpy datetime64. Anyway, all this to say -- I don't think that there is ever a use case for using numpy datetime units longer than a second, certainly not by assuming something from the units of the time. Using seconds as a default for any encoding of seconds or longer seems reasonable to me, though. But is there any real loss to using milliseconds? [*] The way to express, e.g. a daily average, is to use "cell bounds", specifically defining the bounds of the average. |
Thanks @ChrisBarker-NOAA, I think we should move all these valuable comments in this PR into the docs somehow. I can take a look when this one is finalized. |
Properly supporting datetime intervals (rather than just instants) feels like it would solve so many semantic problems. We've been discussing that for years. I hope that it's now feasible post custom indexes refactor. But that's probably off topic for this thread... |
Absolutely -- but yes, a whole other topic :-) |
whats-new.rst
This is another attempt to resolve #7493. This goes a step further than #9580.
The idea of this PR is to automatically infer the needed resolutions for decoding/
encodingand only keep the constraints pandas imposes ("s" - lowest resolution, "ns" - highest resolution). There is still the idea of adefault resolution
, but this should only take precedence if it doesn't clash with the automatic inference. This can be discussed, though. Update: I've implementedtime-unit
-kwarga first try to have default resolutionon decode, which will override the current inferred resolution only to higher resolution (eg.'s'
->'ns'
).For sanity checking, and also for my own good, I've created a documentation page on time-coding in the internal dev section. Any suggestions (especially grammar) or ideas for enhancements are much appreciated.
There still might be room for consolidation of functions/methods (mostly in coding/times.py), but I have to leave it alone for some days. I went down that rabbit hole and need to relax, too 😬.
Looking forward to get your insights here, @spencerkclark, @ChrisBarker-NOAA, @pydata/xarray.
Todo:
time_units
(where appropriate)