Read time from filename #48

jasonmlkang · 2021-10-18T04:23:22Z

Some datasets lack a time variable and attributes and only indicate the date and/or time in the filename. In these cases, the ingester needs to be able to extract the time stamp from the filenames according to a new regular expression setting in the collections-config ConfigMap.

tloubrieu-jpl

Thanks Json for the pull request.

I have a change request which is not mandatory but which will make me use this new feature immediatly: configuration of the datetime pattern. I don't think it is to difficult to add.

Otherwise, I believe it is quite important to make the datetime timezone aware and force it to UTC.

You'll see me comments in the pull request code changes.

tloubrieu-jpl · 2021-10-29T17:40:18Z

granule_ingester/granule_ingester/processors/TileSummarizingProcessor.py

+    days = int(days)
+
+    # since datetime is set to 1/1 (+1 day), timedelta needs to -1 to cancel it out
+    return int((datetime.datetime(year, 1, 1) + datetime.timedelta(days-1)).timestamp())


I think we want to be careful about the UTC timezone.

For example with code

import pytz aware = datetime.datetime(2011,8,15,8,15,12,0,pytz.UTC)

Or with the example in my comment above:

import pytz datetime.strptime(filename, 'A%Y%j.L3m_CHL_chlor_a_4km.nc').replace(tzinfo=pytz.utc)

That would be more simple than the current version of the code and spare some lines of code.

I am not sure if the hardcoded -1 days is also something we would like to have for all the datasets. Maybe an additional optional configuration variable for the collection should give an offset for the min and offset for the max time.
In this case that would be -1 and -1 for both.

I guess the most difficult change in my request is to read from the collection configmap.

@jasonmlkang did you have a chance to look at my comments. We're having similar need for AQACF project but of course the file naming convention is different so we would need to discuss how that could be configured.

Examples of file names from AQACF project:
V5GL01.HybridPM25.NorthAmerica.200807-200807.nc
V4NA03_PM25_NA_200401_200412-RH35.nc

jasonmlkang · 2021-11-12T08:12:00Z

@tloubrieu-jpl I had implemented a more generic way to parse date time from file name. The commit is 92eb7dc. Please take a look ParseDatetimeUtilsTests.py for examples of how to use the parser. Notice that this is only the first take and it is not the complete work. There are additional works to fine tune the time zone and integrate this into sdap ingester.

jasonmlkang · 2021-11-17T22:16:18Z

@tloubrieu-jpl Updated the code to use strptime instead of regex

tloubrieu-jpl · 2022-01-21T17:39:43Z

Hi @jasonmlkang , are you still working on this pull request ?
I think you were trying to deploy that on your laptop to test from completelly the new feature.

From reading the code, I believe something is missing around https://github.com/apache/incubator-sdap-ingester/blob/7ee17fdf16201c499f7bd35cf398844f2c70f046/granule_ingester/granule_ingester/processors/reading_processors/GridReadingProcessor.py#L43 so that the time read in the file is used a dimension in the dataset.

I am coming back to this pull request now because we would need it for AQACF project for the GMU dataset.

Read time from filename

b6a3cc5

tloubrieu-jpl requested changes Oct 29, 2021

View reviewed changes

First take of a more generic way to parse date time from file name

92eb7dc

Use datetime strptime instead of regex

4c1ad6e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read time from filename #48

Read time from filename #48

jasonmlkang commented Oct 18, 2021

tloubrieu-jpl left a comment

tloubrieu-jpl Oct 29, 2021

tloubrieu-jpl Nov 9, 2021

tloubrieu-jpl Nov 9, 2021

jasonmlkang commented Nov 12, 2021

jasonmlkang commented Nov 17, 2021

tloubrieu-jpl commented Jan 21, 2022

Read time from filename #48

Are you sure you want to change the base?

Read time from filename #48

Conversation

jasonmlkang commented Oct 18, 2021

tloubrieu-jpl left a comment

Choose a reason for hiding this comment

tloubrieu-jpl Oct 29, 2021

Choose a reason for hiding this comment

tloubrieu-jpl Nov 9, 2021

Choose a reason for hiding this comment

tloubrieu-jpl Nov 9, 2021

Choose a reason for hiding this comment

jasonmlkang commented Nov 12, 2021

jasonmlkang commented Nov 17, 2021

tloubrieu-jpl commented Jan 21, 2022