Skip to content

Commit

Permalink
Issue #5/#7 some initial docs
Browse files Browse the repository at this point in the history
  • Loading branch information
soxofaan committed Aug 21, 2024
1 parent d9b96cb commit 107364e
Show file tree
Hide file tree
Showing 3 changed files with 160 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# apex_algorithms
Hosted APEx algorithms

APEx Algorithm Propagation Service
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# APEx Algorithms
157 changes: 157 additions & 0 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@


# APEx Algorithm Benchmarking

The hosted APEx Algorithms are executed automatically and regularly
following certain benchmark scenarios in order to:

- verify they still complete successfully
and the output results are as expected (e.g. withing certain tolerance of reference data)
- keep track of certain metrics like execution time,
resource consumption, credit consumption, etc.


## Algorithm and benchmark definitions

### Algorithm definitions

An APEx algorithm is defined as an openEO process definition
in the form of JSON files in the [`openeo_udp`](../openeo_udp/) folder.
These JSON files follow the standard openEO process definition schema,
for example as used for the `GET /process_graphs/{process_graph_id}` endpoint of the openEO API.

> [!NOTE]
> These openEO process definitions are commonly referred to
> as "UDPs" or "user-defined processes",
> which stems from the original openEO API specification
> with an isolated "user" concept.
> The scope of algorithm publishing and hosting in APEx
> goes well beyond isolated, individual users.
> As such, it can get confusing to overthink the "user-defined" part,
> and it might be better to just think of "openEO process definitions".

Example process definition:

```json
{
"id": "max_ndvi",
"parameters": [
{
"name": "temporal_extent",
"schema": {
"type": "array",
"subtype": "temporal-interval",
...
},
...
},
...
],
"process_graph":{
"loadcollection1": {
"process_id": "load_collection",
...
},
"reducedimension1": {
"process_id": "reduce_dimension",
...
},
...
}
}
```

Alongside the JSON files, there might be additional resources,
like Markdown files with documentation or descriptions,
Python scripts to (re)generate the JSON files, etc.

### Benchmark definitions

The benchmark scenarios are defined as JSON files
in the [`benchmark_scenarios`](../benchmark_scenarios/) folder.
The schema of these files is defined (as JSON Schema)
in the [`schema/benchmark_scenario.json`](../schema/benchmark_scenario.json) file.

Example benchmark definition:

```json
[
{
"id": "max_ndvi",
"type": "openeo",
"backend": "openeofed.dataspace.copernicus.eu",
"process_graph": {
"maxndvi1": {
"process_id": "max_ndvi",
"namespace": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/f99f351d74d291d628e3aaa07fd078527a0cb631/openeo_udp/examples/max_ndvi/max_ndvi.json",
"arguments": {
"temporal_extent": ["2023-08-01", "2023-09-30"],
...
},
"result": true
}
},
"reference_data": {
"job-results.json": "https://s3.example/max_ndvi.json:max_ndvi:reference:job-results.json",
"openEO.tif": "https://s3.example/max_ndvi.json:max_ndvi:reference:openEO.tif"
}
},
...
]
```

Note how each benchmark scenario references
- the target openEO backend to use.
- an openEO process graph to execute.
The process graph will typically just contain a single node
pointing with the `namespace` field to the desired process definition
at a URL, following the [remote process definition extension](https://github.com/Open-EO/openeo-api/tree/draft/extensions/remote-process-definition).
The URL will typically be a raw GitHub URL to the JSON file in the `openeo_udp` folder, but it can also be a URL to a different location.
- reference data to which actual results should be compared.


## Benchmarking test suite

The execution of the benchmarks is currently driven through
a [pytest](https://pytest.org/) test suite
defined at [`qa/benchmarks/`](../qa/benchmarks/).
See the project's [README](../qa/benchmarks/README.md) for more details.

The test suite code itself is not very complex.
There is basically just one test function that is parametrized
to run over all benchmark scenarios.

There is however additional tooling around this test,
implemented as pytest plugins.


### Randomly pick a single benchmark scenario

There is a simple plugin defined in the test suite's `conftest.py`
to just run a random subset of benchmark scenarios.
It leverages the `pytest_collection_modifyitems` hook and is exposed
through the command line option `--random-subset`.
With `--random-subset=1`, only a single random benchmark scenario is run.

### Automatically upload generated results

The `apex_algorithm_qa_tools` package includes the
[`pytest_upload_assets`](../qa/tools/apex_algorithm_qa_tools/pytest_upload_assets.py) plugin
which defines a `upload_assets_on_fail` fixture to automatically upload
openEO batch job results to an s3 bucket when the test fails.

### Track benchmark metrics

The `apex_algorithm_qa_tools` package includes the
[`pytest_track_metrics`](../qa/tools/apex_algorithm_qa_tools/pytest_track_metrics.py) plugin
which defines a `track_metrics` fixture to records metrics during the benchmark run
and reports them at the end of the test run.



## GitHub Actions

The benchmarking test suite is executed automatically
with GitHub Actions and can be followed up
at https://github.com/ESA-APEx/apex_algorithms/actions/workflows/benchmarks.yaml

0 comments on commit 107364e

Please sign in to comment.