Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiently watching for changes discussion issue #483

Closed
2 of 5 tasks
machi1990 opened this issue May 29, 2024 · 3 comments
Closed
2 of 5 tasks

Efficiently watching for changes discussion issue #483

machi1990 opened this issue May 29, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@machi1990
Copy link

machi1990 commented May 29, 2024

Context

Maestro provides a .Subscribe grpc method to watch for changes happening on the system.

The way this is implemented on the client side via https://github.com/open-cluster-management-io/sdk-go is that a Lister[T] function needs to be implemented. The integrator then has to;

  • Implements a lister that could return status from there persistence layer e.g a DB, this way only status changes that have truly happened on the system are processed

Implementing a persistence layer to track all resources adds complexity for the integrator side. To workaround the need for the for a persistence store e.g a DB, the lister could be implemented by fetching the resource from the Maestro server (via the Restful API) and adds it in memory. This alternative what #476 aims to solve, however it presents the following challenges;

  1. Tracking a lot of resources in memory since the in memory store is unbounded.

  2. To repopulate the store, the client will need to fetch resources from Maestro API which the Maestro API currently don't support for resource with Bundle type Create a restful API to retrieve resource bundle openshift-online/maestro#96

  3. When a .Resync() is performed, all the status changes that have happened when the client is offline won't be sent since the status hash in the memory store is the same as what's on Maestro server because the client has fetched a fresh objects from Maestro and re-populate the cache. To workaround it, clients will then have to WATCH for the .ADDED event as well of the ManifestWork Informer and process. This means, clients having to re-process events that have already been processed. It can be argued that this is the same as the normal k8s informer so this is less of a challenge since the same pattern used in k8s space could be used to solve this.

  4. Some integrator have N replicas deployed of their component and only 1 pod is the leader which watches for status changes while all the N pod can create manifest work. With the store.List being crucial to how event processing works in the SDK, new ManifestWork creation which happens on a non leader pod won't have their status changes processed by the leader pod since the ManifestWork will only be present in the in memory store of the pod that created it.

discussion points

  • How do we envision to solve the above challenges?

  • How do we avoid memory issue when processing status changes most notably in cases of many events to process. I also noticed that the source client seems to be performing a list every time and loads the resource in memory. Looking at the piece of code, the intent is to perform a GET(obj.ID), is this something that can be done instead?

  • How can a client avoid a storm of events when Resyncing for missed events? . .Resync currents sends the status hash, can we consider an alternative implementation based on sequenceID which might be less expensive to track than status object? The same pattern used in k8s can be used to solve this.

  • Is the List needed and why is it needed?

  • @machi1990 to confirm if re-processing .ADDED event could be an issue at the moment in their component

@qiujian16 @skeeey This is a placeholder issue following our discussion this morning. Please feel free to edit it with anything I've missed and I'll go through it again myself later on to review it and update.

@clyang82
Copy link
Contributor

clyang82 commented May 30, 2024

I draw a diagram to understand how will cluster service use maestro
image

@machi1990 Correct me if my understanding is wrong.

@machi1990
Copy link
Author

Thanks @clyang82 for preparing this. Myself and @clyang82 synced to give more context for a better understanding of the integration path CS is looking to achieve.

@machi1990
Copy link
Author

Child issues were created. See referenced issues above. I'll close this issue and use the referenced card for tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants