Efficiently watching for changes discussion issue #483

machi1990 · 2024-05-29T10:08:16Z

Context

Maestro provides a .Subscribe grpc method to watch for changes happening on the system.

The way this is implemented on the client side via https://github.com/open-cluster-management-io/sdk-go is that a Lister[T] function needs to be implemented. The integrator then has to;

Implements a lister that could return status from there persistence layer e.g a DB, this way only status changes that have truly happened on the system are processed

Implementing a persistence layer to track all resources adds complexity for the integrator side. To workaround the need for the for a persistence store e.g a DB, the lister could be implemented by fetching the resource from the Maestro server (via the Restful API) and adds it in memory. This alternative what #476 aims to solve, however it presents the following challenges;

Tracking a lot of resources in memory since the in memory store is unbounded.
To repopulate the store, the client will need to fetch resources from Maestro API which the Maestro API currently don't support for resource with Bundle type Create a restful API to retrieve resource bundle openshift-online/maestro#96
When a .Resync() is performed, all the status changes that have happened when the client is offline won't be sent since the status hash in the memory store is the same as what's on Maestro server because the client has fetched a fresh objects from Maestro and re-populate the cache. To workaround it, clients will then have to WATCH for the .ADDED event as well of the ManifestWork Informer and process. This means, clients having to re-process events that have already been processed. It can be argued that this is the same as the normal k8s informer so this is less of a challenge since the same pattern used in k8s space could be used to solve this.
Some integrator have N replicas deployed of their component and only 1 pod is the leader which watches for status changes while all the N pod can create manifest work. With the store.List being crucial to how event processing works in the SDK, new ManifestWork creation which happens on a non leader pod won't have their status changes processed by the leader pod since the ManifestWork will only be present in the in memory store of the pod that created it.

discussion points

How do we envision to solve the above challenges?
How do we avoid memory issue when processing status changes most notably in cases of many events to process. I also noticed that the source client seems to be performing a list every time and loads the resource in memory. Looking at the piece of code, the intent is to perform a GET(obj.ID), is this something that can be done instead?
How can a client avoid a storm of events when Resyncing for missed events? . .Resync currents sends the status hash, can we consider an alternative implementation based on sequenceID which might be less expensive to track than status object? The same pattern used in k8s can be used to solve this.
Is the List needed and why is it needed?
@machi1990 to confirm if re-processing .ADDED event could be an issue at the moment in their component

@qiujian16 @skeeey This is a placeholder issue following our discussion this morning. Please feel free to edit it with anything I've missed and I'll go through it again myself later on to review it and update.

The text was updated successfully, but these errors were encountered:

clyang82 · 2024-05-30T07:12:18Z

I draw a diagram to understand how will cluster service use maestro

@machi1990 Correct me if my understanding is wrong.

machi1990 · 2024-05-30T08:44:54Z

Thanks @clyang82 for preparing this. Myself and @clyang82 synced to give more context for a better understanding of the integration path CS is looking to achieve.

machi1990 · 2024-06-07T08:22:43Z

Child issues were created. See referenced issues above. I'll close this issue and use the referenced card for tracking.

machi1990 added the enhancement New feature or request label May 29, 2024

machi1990 mentioned this issue May 29, 2024

✨ add cache for source manifestwork client open-cluster-management-io/sdk-go#50

Merged

machi1990 closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently watching for changes discussion issue #483

Efficiently watching for changes discussion issue #483

machi1990 commented May 29, 2024 •

edited

Loading

clyang82 commented May 30, 2024 •

edited

Loading

machi1990 commented May 30, 2024

machi1990 commented Jun 7, 2024

Efficiently watching for changes discussion issue #483

Efficiently watching for changes discussion issue #483

Comments

machi1990 commented May 29, 2024 • edited Loading

Context

discussion points

clyang82 commented May 30, 2024 • edited Loading

machi1990 commented May 30, 2024

machi1990 commented Jun 7, 2024

machi1990 commented May 29, 2024 •

edited

Loading

clyang82 commented May 30, 2024 •

edited

Loading