Issue #5/#7 some initial docs

ESA-APEx · Aug 21, 2024 · 107364e · 107364e
1 parent d9b96cb
commit 107364e
Show file tree

Hide file tree

Showing 3 changed files with 160 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,2 +1,3 @@
 # apex_algorithms
-Hosted APEx algorithms
+
+APEx Algorithm Propagation Service
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1 @@
+# APEx Algorithms
diff --git a/docs/benchmarking.md b/docs/benchmarking.md
@@ -0,0 +1,157 @@
+
+
+# APEx Algorithm Benchmarking
+
+The hosted APEx Algorithms are executed automatically and regularly
+following certain benchmark scenarios in order to:
+
+- verify they still complete successfully
+  and the output results are as expected (e.g. withing certain tolerance of reference data)
+- keep track of certain metrics like execution time,
+  resource consumption, credit consumption, etc.
+
+
+## Algorithm and benchmark definitions
+
+### Algorithm definitions
+
+An APEx algorithm is defined as an openEO process definition
+in the form of JSON files in the [`openeo_udp`](../openeo_udp/) folder.
+These JSON files follow the standard openEO process definition schema,
+for example as used for the `GET /process_graphs/{process_graph_id}` endpoint of the openEO API.
+
+> [!NOTE]
+> These openEO process definitions are commonly referred to
+> as "UDPs" or "user-defined processes",
+> which stems from the original openEO API specification
+> with an isolated "user" concept.
+> The scope of algorithm publishing and hosting in APEx
+> goes well beyond isolated, individual users.
+> As such, it can get confusing to overthink the "user-defined" part,
+> and it might be better to just think of "openEO process definitions".
+
+
+Example process definition:
+
+```json
+{
+  "id": "max_ndvi",
+  "parameters": [
+    {
+      "name": "temporal_extent",
+      "schema": {
+        "type": "array",
+        "subtype": "temporal-interval",
+        ...
+      },
+      ...
+    },
+    ...
+  ],
+  "process_graph":{
+    "loadcollection1": {
+      "process_id": "load_collection",
+      ...
+    },
+    "reducedimension1": {
+      "process_id": "reduce_dimension",
+      ...
+    },
+    ...
+  }
+}
+```
+
+Alongside the JSON files, there might be additional resources,
+like Markdown files with documentation or descriptions,
+Python scripts to (re)generate the JSON files, etc.
+
+### Benchmark definitions
+
+The benchmark scenarios are defined as JSON files
+in the [`benchmark_scenarios`](../benchmark_scenarios/) folder.
+The schema of these files is defined (as JSON Schema)
+in the [`schema/benchmark_scenario.json`](../schema/benchmark_scenario.json) file.
+
+Example benchmark definition:
+
+```json
+[
+  {
+    "id": "max_ndvi",
+    "type": "openeo",
+    "backend": "openeofed.dataspace.copernicus.eu",
+    "process_graph": {
+      "maxndvi1": {
+        "process_id": "max_ndvi",
+        "namespace": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/f99f351d74d291d628e3aaa07fd078527a0cb631/openeo_udp/examples/max_ndvi/max_ndvi.json",
+        "arguments": {
+          "temporal_extent": ["2023-08-01", "2023-09-30"],
+          ...
+        },
+        "result": true
+      }
+    },
+    "reference_data": {
+      "job-results.json": "https://s3.example/max_ndvi.json:max_ndvi:reference:job-results.json",
+      "openEO.tif": "https://s3.example/max_ndvi.json:max_ndvi:reference:openEO.tif"
+    }
+  },
+  ...
+]
+```
+
+Note how each benchmark scenario references
+- the target openEO backend to use.
+- an openEO process graph to execute.
+  The process graph will typically just contain a single node
+  pointing with the `namespace` field to the desired process definition
+  at a URL, following the [remote process definition extension](https://github.com/Open-EO/openeo-api/tree/draft/extensions/remote-process-definition).
+  The URL will typically be a raw GitHub URL to the JSON file in the `openeo_udp` folder, but it can also be a URL to a different location.
+- reference data to which actual results should be compared.
+
+
+## Benchmarking test suite
+
+The execution of the benchmarks is currently driven through
+a [pytest](https://pytest.org/) test suite
+defined at [`qa/benchmarks/`](../qa/benchmarks/).
+See the project's [README](../qa/benchmarks/README.md) for more details.
+
+The test suite code itself is not very complex.
+There is basically just one test function that is parametrized
+to run over all benchmark scenarios.
+
+There is however additional tooling around this test,
+implemented as pytest plugins.
+
+
+### Randomly pick a single benchmark scenario
+
+There is a simple plugin defined in the test suite's `conftest.py`
+to just run a random subset of benchmark scenarios.
+It leverages the `pytest_collection_modifyitems` hook and is exposed
+through the command line option `--random-subset`.
+With `--random-subset=1`, only a single random benchmark scenario is run.
+
+### Automatically upload generated results
+
+The `apex_algorithm_qa_tools` package includes the
+[`pytest_upload_assets`](../qa/tools/apex_algorithm_qa_tools/pytest_upload_assets.py) plugin
+which defines a `upload_assets_on_fail` fixture to automatically upload
+openEO batch job results to an s3 bucket when the test fails.
+
+### Track benchmark metrics
+
+The `apex_algorithm_qa_tools` package includes the
+[`pytest_track_metrics`](../qa/tools/apex_algorithm_qa_tools/pytest_track_metrics.py) plugin
+which defines a `track_metrics` fixture to records metrics during the benchmark run
+and reports them at the end of the test run.
+
+
+
+## GitHub Actions
+
+The benchmarking test suite is executed automatically
+with GitHub Actions and can be followed up
+at https://github.com/ESA-APEx/apex_algorithms/actions/workflows/benchmarks.yaml