add /timeseries endpoints #33

hrodmn · 2024-10-22T19:07:30Z

The line count looks scary but the big changes here include ~600 lines of code in the new titiler/cmr/timeseries.py file. The rest is in a notebook, a VCR cassette, and tests.

Based on spec and discussion from https://github.com/NASA-IMPACT/veda-architecture/pull/503
Add /timeseries/statistics and possibly a few other endpoints

The goal is to start refining the timeseries API spec by testing out a bunch of CMR collections and seeing what kind of parameters make sense for most cases.

To see some examples of the /timeseries endpoints in action and some descriptions of the API for specifying a timeseries, check out the new timeseries notebook

I am really interested in getting some feedback on the specification of the timeseries parameters. It feels like there are a lot of knobs so maybe there are ways to simplify it.

hrodmn · 2024-10-31T12:19:40Z

titiler/cmr/timeseries.py

+        Optional[str],
+        Query(
+            description="Start datetime for timeseries request",
+        ),
+    ] = None
+    end_datetime: Annotated[
+        Optional[str],
+        Query(
+            description="End datetime for timeseries request",
+        ),
+    ] = None
+    step: Annotated[
+        Optional[str],
+        Query(
+            description="Time step between timeseries intervals, expressed as [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations)"
+        ),
+    ] = None
+    step_idx: Annotated[
+        Optional[int],
+        Query(description="Optional (zero-indexed) index of the desired time step"),
+    ] = None
+    exact: Annotated[
+        Optional[bool],
+        Query(
+            description="If true, queries will be made for a point-in-time at each step. If false, queries will be made for the entire interval between steps"
+        ),
+    ] = None
+    datetimes: Annotated[
+        Optional[str],
+        Query(
+            description="Optional list of comma-separated specific time points or time intervals to summarize over"
+        ),
+    ] = None
+
+
+def parse_duration(duration: str) -> relativedelta:
+    """Parse ISO 8601 duration string to relativedelta."""
+    match = re.match(
+        r"P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)W)?(?:(\d+)D)?(?:T(?:(\d+)H)?(?:(\d+)M)?(?:(\d+(?:\.\d+)?)S)?)?",
+        duration,
+    )
+    if not match or not any(m for m in match.groups()):
+        raise ValueError(f"{duration} is an invalid duration format")
+
+    years, months, weeks, days, hours, minutes, seconds = [
+        float(g) if g else 0 for g in match.groups()
+    ]
+    return relativedelta(
+        years=int(years),
+        months=int(months),
+        weeks=int(weeks),
+        days=int(days),
+        hours=int(hours),
+        minutes=int(minutes),
+        seconds=int(seconds),
+        microseconds=int((seconds % 1) * 1e6),
+    )
+
+
+def generate_datetime_ranges(
+    start_datetime: datetime, end_datetime: datetime, step: str, exact: bool = False
+) -> List[Union[Tuple[datetime], Tuple[datetime, datetime]]]:
+    """Generate datetime ranges"""
+    start = start_datetime
+    end = end_datetime
+    step_delta = parse_duration(step)
+
+    ranges: List[Union[Tuple[datetime], Tuple[datetime, datetime]]] = []
+    current = start
+
+    step_timedelta = (current + step_delta) - current
+    is_small_timestep = step_timedelta <= timedelta(seconds=1)
+
+    while current < end:
+        if exact:
+            # For exact case, return a tuple with just one exact datetime
+            next_step = current + step_delta
+            ranges.append((current,))
+        else:
+            next_step = min(current + step_delta, end)
+            if next_step == end:
+                ranges.append((current, next_step))
+                break
+
+            if is_small_timestep:
+                # Subtract 1 millisecond for small timesteps
+                ranges.append((current, next_step - timedelta(microseconds=1)))
+            else:
+                # Subtract 1 second for larger timesteps
+                ranges.append((current, next_step - timedelta(seconds=1)))
+
+        current = next_step
+
+        if current == end:
+            ranges.append((end,))
+
+    if not ranges:
+        return [(start, end)]
+
+    return ranges


All of this would eventually move to titiler.extensions.timeseries but I wanted to do the development all in one place to facilitate easier iteration during early stages of development.

hrodmn · 2024-10-31T12:25:48Z

titiler/cmr/timeseries.py

A few of the classes and models defined at the top of this module will eventually go to titiler.extensions.timeseries so they can be re-used by other applications.

review-notebook-app · 2024-11-01T16:36:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

abarciauskas-bgse

This all looks good to me, I just made comments about clarifying some of the function and dependency descriptions. I focused on the notebook and titiler/cmr/timeseries.py. If time allows, I would like to understand how the VCR and data/podaac... files are being used (presumably in the tests).

titiler/cmr/timeseries.py

abarciauskas-bgse · 2024-11-01T17:05:24Z

titiler/cmr/timeseries.py

+                base_url=str(factory.url_for(request, "geojson_statistics")),
+                request=request,
+                param_list=query,
+            )


Just to make sure I understand how this works: the timeseries params are generated as part of the dependency timeseries_query_no_bbox - which generates a query object that is list of objects each containing a concept id and datetime. Build request urls returns a list of urls, each of which is a statistics endpoint, and then in the line below a request is made to each of those statistics endpoints. Is that understanding correct?

@abarciauskas-bgse yes that is correct! The code in the /timeseries endpoints just converts the timeseries query parameters into a list of datetime parameters, then a fires off a lower-level request for each unique datetime parameter. Then there is a little bit of code to organize the results (e.g. list of PNGs to a GIF, list of statistics geojson output into a single geojson).

maxrjones

I'm totally blown away by the GIF possibilities 🎉 🌍 I unfortunately wasn't able to fully dive into a review today, so I will try it out more on Monday but wanted to share a couple questions/comments so far.

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time. If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

titiler/cmr/timeseries.py

hrodmn · 2024-11-02T00:50:25Z

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time.

I think we could just extend the original /bbox endpoint but there would wind up being a big if "gif" then ... else: ... control flow since the timeseries functionality is just farming out the requests to the /bbox endpoint again. Maybe there would be an elegant way to write a recursive approach.

If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

This would definitely make sense if we are just going to stick with the gif format but I want to add support for mp4 export, too, so I kept the format argument in there.

I think the /bbox endpoint label does a bad job describing what it actually does, so maybe it would make more sense to do /timeseries/{format} where format could be gif or mp4.

vincentsarago · 2024-11-04T07:30:35Z

Dockerfile

@@ -26,5 +26,5 @@ RUN if [ -z "$EARTHDATA_USERNAME" ] || [ -z "$EARTHDATA_PASSWORD" ]; then \
 # http://www.uvicorn.org/settings/
 ENV HOST 0.0.0.0
 ENV PORT 80
-CMD uv run uvicorn titiler.cmr.main:app --host ${HOST} --port ${PORT} --log-level debug
+CMD uv run uvicorn titiler.cmr.main:app --host ${HOST} --port ${PORT} --log-level debug --reload


you don't really need the reload here because the code is copied not mounted

I added it because I mount the titiler/ directory in docker-compose.yml. It is redundant to COPY and mount the volume but it is convenient for local development to have the reload functionality.

pyproject.toml

tests/test_app.py

tests/test_timeseries.py

tests/test_timeseries_extension.py

titiler/cmr/timeseries.py

maxrjones · 2024-11-04T16:49:14Z

My main question is about the pros/cons of having a new /timeseries/bbox url rather than adding support for returning image/gif to the images/bbox url. They share a ton of parameters and to me seem conceptually very similar with the return type just dependent on whether you have discrete mosaicing in time.

I think we could just extend the original /bbox endpoint but there would wind up being a big if "gif" then ... else: ... control flow since the timeseries functionality is just farming out the requests to the /bbox endpoint again. Maybe there would be an elegant way to write a recursive approach.

If it remains under /timeseries and will be specific to gif return types, would it help for clarity to just name if timeseries/gif rather than timeseries/bbox?

This would definitely make sense if we are just going to stick with the gif format but I want to add support for mp4 export, too, so I kept the format argument in there.

I think the /bbox endpoint label does a bad job describing what it actually does, so maybe it would make more sense to do /timeseries/{format} where format could be gif or mp4.

The timeseries/bbox url calls the images/bbox url to get the individual images to combine into a gif, right? If so, it might be safer to keep them separate to avoid a potential for recursive lambda calls. I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

hrodmn · 2024-11-06T17:17:56Z

I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

I went with /timeseries/bbox/{xmin},{ymin},{xmax},{ymax}.{format} to be consistent with the non-timeseries /bbox endpoint, but I agree with @maxrjones that /timeseries/bbox path makes it less than obvious what it actually does. @vincentsarago do you have an opinion on staying consistent with /bbox or more creating a more descriptive with /timeseries/{format} (-> /timeseries/gif, /timeseries/mp4)?

maxrjones · 2024-11-06T17:24:36Z

I would lean towards a more descriptive label like /timeseries/{format} or /timeseries/movie with two output types.

I went with /timeseries/bbox/{xmin},{ymin},{xmax},{ymax}.{format} to be consistent with the non-timeseries /bbox endpoint, but I agree with @maxrjones that /timeseries/bbox path makes it less than obvious what it actually does. @vincentsarago do you have an opinion on staying consistent with /bbox or more creating a more descriptive with /timeseries/{format} (-> /timeseries/gif, /timeseries/mp4)?

sounds good, I think this could be considered a documentation issue. the self-describing APIs are nice for maintenance, but a downside is new people may not find it sufficient for finding features.

hrodmn · 2024-11-07T02:32:51Z

Thank you all for your thoughtful reviews of these changes!!

After thinking it over some more I want to stick with /timeseries/bbox for the sake of consistency with other titiler APIs. I added some more verbose descriptions to the openapi documentation to make the purpose of each /timeseries endpoint more clear. I also opened up an issue to actually publish the docs to the web! #38

hrodmn self-assigned this Oct 22, 2024

hrodmn force-pushed the feature/timeseries branch 3 times, most recently from 8a664f0 to a827171 Compare October 28, 2024 18:48

hrodmn force-pushed the feature/timeseries branch from 3cbccc1 to 265351d Compare October 31, 2024 02:38

hrodmn changed the title ~~DRAFT: add /timeseries endpoints~~ add /timeseries endpoints Oct 31, 2024

hrodmn requested review from vincentsarago, maxrjones and abarciauskas-bgse October 31, 2024 02:45

hrodmn marked this pull request as ready for review October 31, 2024 02:50

hrodmn force-pushed the feature/timeseries branch 2 times, most recently from 3a4253e to 96c28d6 Compare October 31, 2024 12:16

hrodmn commented Oct 31, 2024

View reviewed changes

hrodmn force-pushed the feature/timeseries branch from 96c28d6 to 7b82fa4 Compare October 31, 2024 12:24

hrodmn commented Oct 31, 2024

View reviewed changes

hrodmn force-pushed the feature/timeseries branch from 7b82fa4 to 5d2998c Compare October 31, 2024 16:58

add timeseries endpoints

72f6e94

hrodmn force-pushed the feature/timeseries branch from 5d2998c to 72f6e94 Compare November 1, 2024 11:40

abarciauskas-bgse approved these changes Nov 1, 2024

View reviewed changes

maxrjones reviewed Nov 1, 2024

View reviewed changes

titiler/cmr/timeseries.py Outdated Show resolved Hide resolved