prototype of udp based job manager #1

jdries · 2024-06-28T08:26:00Z

Proposal for a UDP based variant of the job manager, created as part of APEx upscaling service:
https://jdries-vito.quarto.pub/apex-design/upscaling.html

Related issue is to support output to geoparquet: Open-EO/openeo-gfmap#107

The currently used csv format is limited in the sense that complex parameter types fail to deserialize correctly, requiring custom handling in this class. GeoParquet might improve this:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#json

soxofaan

a couple of notes

If I understand correctly, this PR adds two separate features to the existing job manager:

producing jobs from a fixed (but parameterized) UDP and a user-provided dataframe of parameter
Running the job manager in a thread

These features seems to be totally unrelated, so I wonder if they can't be separated.

For example:

the producing of the jobs could be a factory for a standard job manager
the threaded running could a method on the standard job manager

soxofaan · 2024-07-02T12:03:30Z

src/esa_apex_toolbox/upscaling/udp_job_manager.py

+        if self.dataframe is None:
+            self.dataframe = jobs_dataframe
+        else:
+            raise ValueError("Jobs already added to the job manager.")


this if else raise pattern looks like this could have been a constructor argument

soxofaan · 2024-07-02T12:09:30Z

src/esa_apex_toolbox/upscaling/udp_job_manager.py

+                          p.get("schema", {}).get("subtype", "") == "geojson"]
+
+
+        output_file = Path("jobs.csv")


This static file reference should be an argument I guess

soxofaan · 2024-07-02T12:15:38Z

src/esa_apex_toolbox/upscaling/udp_job_manager.py

+
+            cube = connection.datacube_from_process(row.udp_id,row.udp_namespace, **parameters)
+
+            title = row.get("title", f"Subjob {row.udp_id} - {str(parameters)}")


use row index instead of str(parameters) in title to avoid extremely large titles?

soxofaan · 2024-07-02T12:15:58Z

src/esa_apex_toolbox/upscaling/udp_job_manager.py

+
+
+
+        import multiprocessing, time


these imports can be toplevel I think

soxofaan · 2024-10-14T15:36:14Z

Because of various changes to the "official" MultiBackendJobManager from the client (e.g. built-in theaded run_jobs, and new job db initialization features), I think this PR is dead end, and better be closed.
However, it served as inspiration to implement UDP based job management in the python client itself:

initial PoC implementation of UDPJobFactory Open-EO/openeo-python-client#644

soxofaan · 2024-10-16T13:13:37Z

just merged Open-EO/openeo-python-client#644

prototype of udp based job manager

bca93e7

soxofaan reviewed Jul 2, 2024

View reviewed changes

jdries mentioned this pull request Aug 19, 2024

UDP based job manager Open-EO/openeo-python-client#604

Closed

soxofaan mentioned this pull request Oct 14, 2024

initial PoC implementation of UDPJobFactory Open-EO/openeo-python-client#644

Merged

soxofaan closed this Oct 14, 2024

jdries deleted the udp_job_manager branch October 16, 2024 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype of udp based job manager #1

prototype of udp based job manager #1

jdries commented Jun 28, 2024 •

edited

Loading

soxofaan left a comment

soxofaan Jul 2, 2024

soxofaan Jul 2, 2024

soxofaan Jul 2, 2024

soxofaan Jul 2, 2024

soxofaan commented Oct 14, 2024

soxofaan commented Oct 16, 2024

		p.get("schema", {}).get("subtype", "") == "geojson"]


		output_file = Path("jobs.csv")


		cube = connection.datacube_from_process(row.udp_id,row.udp_namespace, **parameters)

		title = row.get("title", f"Subjob {row.udp_id} - {str(parameters)}")

prototype of udp based job manager #1

prototype of udp based job manager #1

Conversation

jdries commented Jun 28, 2024 • edited Loading

soxofaan left a comment

Choose a reason for hiding this comment

soxofaan Jul 2, 2024

Choose a reason for hiding this comment

soxofaan Jul 2, 2024

Choose a reason for hiding this comment

soxofaan Jul 2, 2024

Choose a reason for hiding this comment

soxofaan Jul 2, 2024

Choose a reason for hiding this comment

soxofaan commented Oct 14, 2024

soxofaan commented Oct 16, 2024

jdries commented Jun 28, 2024 •

edited

Loading