-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read time from filename #48
base: develop
Are you sure you want to change the base?
Read time from filename #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Json for the pull request.
I have a change request which is not mandatory but which will make me use this new feature immediatly: configuration of the datetime pattern. I don't think it is to difficult to add.
Otherwise, I believe it is quite important to make the datetime timezone aware and force it to UTC.
You'll see me comments in the pull request code changes.
days = int(days) | ||
|
||
# since datetime is set to 1/1 (+1 day), timedelta needs to -1 to cancel it out | ||
return int((datetime.datetime(year, 1, 1) + datetime.timedelta(days-1)).timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to be careful about the UTC timezone.
For example with code
import pytz
aware = datetime.datetime(2011,8,15,8,15,12,0,pytz.UTC)
Or with the example in my comment above:
import pytz
datetime.strptime(filename, 'A%Y%j.L3m_CHL_chlor_a_4km.nc').replace(tzinfo=pytz.utc)
That would be more simple than the current version of the code and spare some lines of code.
I am not sure if the hardcoded -1 days is also something we would like to have for all the datasets. Maybe an additional optional configuration variable for the collection should give an offset for the min and offset for the max time.
In this case that would be -1 and -1 for both.
I guess the most difficult change in my request is to read from the collection configmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonmlkang did you have a chance to look at my comments. We're having similar need for AQACF project but of course the file naming convention is different so we would need to discuss how that could be configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples of file names from AQACF project:
V5GL01.HybridPM25.NorthAmerica.200807-200807.nc
V4NA03_PM25_NA_200401_200412-RH35.nc
@tloubrieu-jpl I had implemented a more generic way to parse date time from file name. The commit is 92eb7dc. Please take a look ParseDatetimeUtilsTests.py for examples of how to use the parser. Notice that this is only the first take and it is not the complete work. There are additional works to fine tune the time zone and integrate this into sdap ingester. |
@tloubrieu-jpl Updated the code to use strptime instead of regex |
Hi @jasonmlkang , are you still working on this pull request ? From reading the code, I believe something is missing around https://github.com/apache/incubator-sdap-ingester/blob/7ee17fdf16201c499f7bd35cf398844f2c70f046/granule_ingester/granule_ingester/processors/reading_processors/GridReadingProcessor.py#L43 so that the time read in the file is used a dimension in the dataset. I am coming back to this pull request now because we would need it for AQACF project for the GMU dataset. |
Some datasets lack a time variable and attributes and only indicate the date and/or time in the filename. In these cases, the ingester needs to be able to extract the time stamp from the filenames according to a new regular expression setting in the collections-config ConfigMap.