From 1bcea0aa5ef5f6d740b23938b3ee365929c649c3 Mon Sep 17 00:00:00 2001 From: hrodmn Date: Sat, 21 Dec 2024 06:45:03 -0600 Subject: [PATCH] replace benchmark report with usage limits doc --- docs/deployment/time_series_api_limits.ipynb | 312 ++++++++++++++++++ docs/time_series_performance_benchmarks.ipynb | 163 --------- mkdocs.yml | 2 + 3 files changed, 314 insertions(+), 163 deletions(-) create mode 100644 docs/deployment/time_series_api_limits.ipynb delete mode 100644 docs/time_series_performance_benchmarks.ipynb diff --git a/docs/deployment/time_series_api_limits.ipynb b/docs/deployment/time_series_api_limits.ipynb new file mode 100644 index 0000000..c30ae2d --- /dev/null +++ b/docs/deployment/time_series_api_limits.ipynb @@ -0,0 +1,312 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0644081c-161f-43fa-8a61-7dc0efb26d08", + "metadata": {}, + "source": [ + "# Time series API limits\n", + "\n", + "The `titiler-cmr` API can be deployed as a Lambda function in AWS. Since requests to the time series endpoints will make recursive requests to the Lambda function for the lower-level time step operations, there are some limits in place to avoid making large requests that are likely to overwhelm the API.\n", + "\n", + "## Highlights\n", + "- Maximum of 995 discrete points or intervals in a time series request (due to Lambda concurrency limits)\n", + "- You can use the length of the time series, the AOI size, and the resolution of the dataset to calculate the number of total pixels (`x_pixels * y_pixels * n_time`) which is helpful for determining if a request will succeed\n", + "- The `/timeseries/bbox` endpoint for generating GIFs for a bounding box will struggle on requests for a large AOI and/or a lengthy time series for high spatial resolution datasets. Based on a coarse evaluation of the API, it is estimated that requests that read **less than 100,000,000 total pixels** from the raw data will tend to succeed. There is a limit in place that will cause requests that exceed this limit to fail fast without firing hundreds of doomed Lambda invocations.\n", + "- The `/timeseries/statistics` endpoint can handle larger requests than the `/timeseries/bbox` endpoint Based on a coarse evaluation of the API, requests that read **less than 15,000,000,000 total pixels** from the raw data will tend to succeed, however requests are limited to reading fewer than 56,000,000 pixels for any individual time step.\n", + "\n", + "## Background\n", + "The time series API provides rapid access to time series analysis and visualization of collections in the CMR catalog, but there are some limitations to the API deployment that require some care when making large requests.\n", + "\n", + "There are several factors that must be considered in order to make a successful time series request:\n", + "- Spatial resolution of the dataset (especially for the xarray backend)\n", + "- Request AOI size\n", + "- Number of points/intervals in the time series\n", + "\n", + "These factors all influence the runtime and memory footprint of the initial `/timeseries` request and requests that are too large in any of those dimensions can result in an API failure. Here are a few guidelines to help you craft successful time series requests." + ] + }, + { + "cell_type": "markdown", + "id": "7e436954-d115-4c6a-8aee-40e276532aa0", + "metadata": {}, + "source": [ + "## Details\n", + "\n", + "### Number of points/intervals in the time series\n", + "\n", + "The top factor that determines if a request will succeed or fail is the number of points in the time series. In the default deployment, there is a hard cap of 995 time points in any time series request. This cap is in place because there is a concurrency limit of 1000 on the Lambda function that executes the API requests." + ] + }, + { + "cell_type": "markdown", + "id": "4da8f9c2-fe34-4d16-a72a-04b50decd2c5", + "metadata": {}, + "source": [ + "### Spatial resolution and AOI size\n", + "\n", + "For datasets that use the `rasterio` backend, there will be very few limitations on maximum array size as long as the data are COGs and you specify a reasonable output image size (or use the `max_size` parameter) in your request.\n", + "\n", + "For datasets without overviews/pyramids, `titiler-cmr` will need to read all of the bytes that overlap the request AOI even if the resulting image is going to be downsampled for a GIF. Therefore, if the area of interest for a `/timeseries/statistics` or `/timeseries/bbox` request will create a large array that is likely to exceed the capacity of the Lambda function, the request will fail fast.\n", + "\n", + "The limits for the `xarray` backend are:\n", + "- `/timeseries/bbox`\n", + " - individual image size: `5.6e7` pixels (~7500x7500)\n", + " - total image size (`x_pixels * y_pixels * n_time`): `1e8` pixels\n", + "- `/timeseries/statistics`\n", + " - individual image size: `5.6e7` pixels (~7500x7500)\n", + " - total image size: `1.5e10` pixels\n", + "\n", + "For low-resolution datasets (e.g. 28km or 0.25 degree) you will not run into any issues (unless you request too many time points!) because a request for the full dataset will be reading arrays that are ~1440x720 pixels. \n", + "\n", + "For higher-resolution datasets (e.g. 1km or 0.01 degree), you will start to run into problems as the size of the raw arrays that titiler-cmr is processing increases (and the number of discrete points or intervals increases). " + ] + }, + { + "cell_type": "markdown", + "id": "b1ed1102-5c43-45d5-99c2-1f7591f8225f", + "metadata": {}, + "source": [ + "### Examples\n", + "\n", + "The MUR-SST dataset is good for demonstrating the limits of the time series endpoints with the `xarray` backend. It has high resolution (1 km, 0.01 degree) daily global sea surface temperature observations! With this resolution it is easy to craft a request that will break the `/timeseries` endpoints. Here are some examples of how to manipulate the time series parameters to achieve success with the `/timeseries/bbox` endpoint.\n", + "\n", + "```python\n", + "from datetime import datetime, timedelta\n", + "\n", + "import httpx\n", + "```\n", + "\n", + "Here is a request that will succeed (if the lambda is warmed up):\n", + "- 5x5 degree bounding box (500 x 500 pixels)\n", + "- 180 daily observations (`180 / P1D`)\n", + "- total size: `500 * 500 * 180 = 4.5e7`\n", + "\n", + "```python\n", + "bounds = (-5, -5, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=180)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P1D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "That request is about half of the maximum request size for the `/timeseries/bbox` endpoint. We can push it to the limit by doubling the length of the time series:\n", + "- 5x5 degree bounding box (500 x 500 pixels)\n", + "- 360 daily observations (`360 / P1D`)\n", + "- total size: `500 * 500 * 360 = 9.0e7`\n", + "\n", + "```python\n", + "bounds = (-5, -5, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=360)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P1D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "If we increase the length of the time series such that the request exceeds the maximum size, the API will return an error:\n", + "- 5x5 degree bounding box (500 x 500 pixels)\n", + "- 540 daily observations (`540 / P1D`)\n", + "- total size: `500 * 500 * 540 = 1.35e8` (greater than maximum of `1.0e8`!)\n", + "\n", + "```python\n", + "bounds = (-5, -5, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=540)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P1D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "We can get get a successful response for the larger time window if we reduce the temporal resolution:\n", + "- 5x5 degree bounding box (500 x 500 pixels)\n", + "- 77 weekly observations (`540 / P7D`)\n", + "- total size: `500 * 500 * 77 = 1.925e7`\n", + "\n", + "```python\n", + "bounds = (-5, -5, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=540)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P7D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "With the weekly temporal resolution we have some room to increase the size of the bounding box!\n", + "- 10x10 degree bounding box (1000 x 1000 pixels)\n", + "- 77 weekly observations (`540 / P7D`)\n", + "- total size: `1000 * 1000 * 77 = 7.7e7`\n", + "\n", + "```python\n", + "bounds = (-10, -10, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=540)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P7D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "If we double the AOI size again, we will break exceed the request size limit:\n", + "- 20x20 degree bounding box (1000 x 1000 pixels)\n", + "- 77 weekly observations (`540 / P7D`)\n", + "- total size: `2000 * 2000 * 77 = 3.08e8` (greater than maximum of `1e8`\n", + "\n", + "```python\n", + "bounds = (-20, -20, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=540)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P7D\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "But if we reduce the temporal resolution from weekly to monthly, it will work!\n", + "- 20x20 degree bounding box (1000 x 1000 pixels)\n", + "- 18 monthly observations (`540 / P1M`)\n", + "- total size: `2000 * 2000 * 18 = 3.08e8`\n", + "\n", + "```python\n", + "bounds = (-20, -20, 0, 0)\n", + "bbox_str = \",\".join(str(x) for x in bounds)\n", + "\n", + "start_datetime = datetime(year=2011, month=1, day=1, hour=0, minute=0, second=1)\n", + "end_datetime = start_datetime + timedelta(days=540)\n", + "\n", + "response = httpx.get(\n", + " f\"https://dev-titiler-cmr.delta-backend.com/timeseries/bbox/{bbox_str}.gif\",\n", + " params={\n", + " \"concept_id\": \"C1996881146-POCLOUD\",\n", + " \"datetime\": \"/\".join(dt.isoformat() for dt in [start_datetime, end_datetime]),\n", + " \"step\": \"P1M\",\n", + " \"variable\": \"analysed_sst\",\n", + " \"backend\": \"xarray\",\n", + " \"rescale\": \"273,315\",\n", + " \"colormap_name\": \"viridis\",\n", + " \"temporal_mode\": \"point\",\n", + " },\n", + " timeout=None,\n", + ")\n", + "```\n", + "\n", + "However, there is a maximum image size that we can read with the `xarray` backend, so we cannot increase the bounding box indefinitely. The limit imposed on the API at this time is `5.6e7` pixels (7500 x 7500 pixels). In the case of MUR-SST, that is a bounding box of roughly 75 x 75 degrees." + ] + }, + { + "cell_type": "markdown", + "id": "4033af4c-6c85-45d5-9e5e-f2a15af471ab", + "metadata": {}, + "source": [ + "## Tips\n", + "\n", + "- If you hit an error because the total size of the request is too large, try reducing the temporal resolution of the time series, e.g. from daily (`P1D`) to weekly (`P7D`) or greater (`P10D`)\n", + "- If you need higher temporal resolution but the full request is not able handle it, split the request into multiple smaller requests and merge the results yourself!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/time_series_performance_benchmarks.ipynb b/docs/time_series_performance_benchmarks.ipynb deleted file mode 100644 index 37d955a..0000000 --- a/docs/time_series_performance_benchmarks.ipynb +++ /dev/null @@ -1,163 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "0644081c-161f-43fa-8a61-7dc0efb26d08", - "metadata": {}, - "source": [ - "# Time series performance benchmarks\n", - "\n", - "The `titiler-cmr` API is deployed as a Lambda function in the SMCE VEDA AWS account. For small time series requests (<500 time points) you can expect a response from any of the endpoints within ~20 seconds. For larger time series requests, you run the risk of bumping into Lambda concurrency or timeout limits. This report shows some results from the `test_timeseries_benchmarks.py` script that sends many requests with varying time series lengths as well as several other parameters that affect runtime." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5678099f-dcd7-4c09-86e6-a1c2b4845d51", - "metadata": {}, - "outputs": [], - "source": [ - "import benchmark_analysis as ba" - ] - }, - { - "cell_type": "markdown", - "id": "a335bfb6-bdd9-4bfd-84a2-dd805f44ac63", - "metadata": {}, - "source": [ - "## xarray backend\n", - "The following tests use the following datasets to evaluate the limits of the `/timeseries` endpoints for the `xarray` backend \n", - "- [GAMSSA 28km SST](https://podaac.jpl.nasa.gov/dataset/GAMSSA_28km-ABOM-L4-GLOB-v01): a daily 0.25 degree (~28 km) resolution dataset with sea surface temperature and sea ice fraction variables\n", - "- [MUR SST](https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html): a daily 0.01 degree (~1km) resolution dataset with sea surface temperature variables" - ] - }, - { - "cell_type": "markdown", - "id": "7e436954-d115-4c6a-8aee-40e276532aa0", - "metadata": {}, - "source": [ - "### statistics\n", - "\n", - "Under the current deployment configuration `statistics` endpoint can process time series requests with up to ~1000 points. Requests that involve more than 1000 points are likely to fail." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7777354a-8ff9-4e5a-a518-5e72969aeefe", - "metadata": {}, - "outputs": [], - "source": [ - "for dataset, df in ba.dfs[\"statistics\"].items():\n", - " fig = ba.plot_error_rate_heatmap(\n", - " df=df,\n", - " x=\"num_timepoints\",\n", - " y=\"bbox_dims\",\n", - " z=\"error_rate\",\n", - " labels={\"x\": \"number of time points\", \"y\": \"bbox dimensions\", \"color\": \"error rate\"},\n", - " title=f\"{dataset}: error rate by bbox size and number of time points\",\n", - " )\n", - " fig.show()" - ] - }, - { - "cell_type": "markdown", - "id": "4da8f9c2-fe34-4d16-a72a-04b50decd2c5", - "metadata": {}, - "source": [ - "In general, the size of the area you want to analyze will have minimal impact on the runtime! This is because `titiler.xarray` has to read the entire granule into memory before subsetting, so reducing the size of the AOI does not reduce the overall footprint of the computation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f4f2e7fd-c6ac-4ddc-8226-757867a4e083", - "metadata": {}, - "outputs": [], - "source": [ - "for dataset, df in ba.dfs[\"statistics\"].items():\n", - " ba.plot_line_with_error_bars(\n", - " df=df.sort_values([\"bbox_size\", \"num_timepoints\"]),\n", - " color=\"bbox_dims\",\n", - " title=f\"{dataset}: statistics runtime\",\n", - " ).show()\n" - ] - }, - { - "cell_type": "markdown", - "id": "e2d45d53-ebbc-4979-8d53-c8e904b0830a", - "metadata": {}, - "source": [ - "### bbox (animations)\n", - "\n", - "Under the current deployment configuration the `bbox` endpoint can reliably process time series requests with up to ~500 points. Requests that involve more than 500 points may fail if the area of interest is very large." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "68f76925-c623-4614-b36b-6a797b387ca9", - "metadata": {}, - "outputs": [], - "source": [ - "for dataset, df in ba.dfs[\"bbox\"].items():\n", - " for img_size in sorted(df[\"img_size\"].unique()):\n", - " img_size_df = df[df[\"img_size\"] == img_size]\n", - " img_dims = img_size_df[\"img_dims\"].unique()[0]\n", - " ba.plot_error_rate_heatmap(\n", - " df=img_size_df,\n", - " x=\"num_timepoints\",\n", - " y=\"bbox_dims\",\n", - " z=\"error_rate\",\n", - " labels={\"x\": \"number of time points\", \"y\": \"bbox dimensions\", \"color\": \"error rate\"},\n", - " title=f\"{dataset}: image size {img_dims}\",\n", - " ).show()" - ] - }, - { - "cell_type": "markdown", - "id": "ccfd3480-0092-4845-9b4a-c688dbfd4aa6", - "metadata": {}, - "source": [ - "The size of the area of interest increases the response time, especially for requests for higher resolution images." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "abd9e612-51f6-4ef5-8499-18b42a76ab28", - "metadata": {}, - "outputs": [], - "source": [ - "for dataset, df in ba.dfs[\"bbox\"].items():\n", - " ba.plot_line_with_error_bars(\n", - " df=df.sort_values([\"bbox_size\", \"num_timepoints\"]),\n", - " color=\"bbox_dims\",\n", - " facet_row=\"img_dims\",\n", - " title=f\"{dataset}: runtime by bbox size and image dimensions\"\n", - " ).show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/mkdocs.yml b/mkdocs.yml index 07e9f93..d0612d8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -29,6 +29,8 @@ nav: - examples/xarray_backend_example.ipynb - examples/rasterio_backend_example.ipynb - examples/time_series_example.ipynb + - Deployment: + - deployment/time_series_api_limits.ipynb - Development - Contributing: contributing.md - Release notes: release-notes.md