You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implements a lister that could return status from there persistence layer e.g a DB, this way only status changes that have truly happened on the system are processed
Implementing a persistence layer to track all resources adds complexity for the integrator side. To workaround the need for the for a persistence store e.g a DB, the lister could be implemented by fetching the resource from the Maestro server (via the Restful API) and adds it in memory. This alternative what #476 aims to solve, however it presents the following challenges;
Tracking a lot of resources in memory since the in memory store is unbounded.
When a .Resync() is performed, all the status changes that have happened when the client is offline won't be sent since the status hash in the memory store is the same as what's on Maestro server because the client has fetched a fresh objects from Maestro and re-populate the cache. To workaround it, clients will then have to WATCH for the .ADDED event as well of the ManifestWork Informer and process. This means, clients having to re-process events that have already been processed. It can be argued that this is the same as the normal k8s informer so this is less of a challenge since the same pattern used in k8s space could be used to solve this.
Some integrator have N replicas deployed of their component and only 1 pod is the leader which watches for status changes while all the N pod can create manifest work. With the store.List being crucial to how event processing works in the SDK, new ManifestWork creation which happens on a non leader pod won't have their status changes processed by the leader pod since the ManifestWork will only be present in the in memory store of the pod that created it.
discussion points
How do we envision to solve the above challenges?
How do we avoid memory issue when processing status changes most notably in cases of many events to process. I also noticed that the source client seems to be performing a list every time and loads the resource in memory. Looking at the piece of code, the intent is to perform a GET(obj.ID), is this something that can be done instead?
How can a client avoid a storm of events when Resyncing for missed events? . .Resync currents sends the status hash, can we consider an alternative implementation based on sequenceID which might be less expensive to track than status object? The same pattern used in k8s can be used to solve this.
Is the List needed and why is it needed?
@machi1990 to confirm if re-processing .ADDED event could be an issue at the moment in their component
@qiujian16@skeeey This is a placeholder issue following our discussion this morning. Please feel free to edit it with anything I've missed and I'll go through it again myself later on to review it and update.
The text was updated successfully, but these errors were encountered:
Thanks @clyang82 for preparing this. Myself and @clyang82 synced to give more context for a better understanding of the integration path CS is looking to achieve.
Context
Maestro provides a
.Subscribe
grpc method to watch for changes happening on the system.The way this is implemented on the client side via https://github.com/open-cluster-management-io/sdk-go is that a
Lister[T]
function needs to be implemented. The integrator then has to;Implementing a persistence layer to track all resources adds complexity for the integrator side. To workaround the need for the for a persistence store e.g a DB, the lister could be implemented by fetching the resource from the Maestro server (via the Restful API) and adds it in memory. This alternative what #476 aims to solve, however it presents the following challenges;
Tracking a lot of resources in memory since the in memory store is unbounded.
To repopulate the store, the client will need to fetch resources from Maestro API which the Maestro API currently don't support for resource with Bundle type Create a restful API to retrieve resource bundle openshift-online/maestro#96
When a
.Resync()
is performed, all the status changes that have happened when the client is offline won't be sent since the status hash in the memory store is the same as what's on Maestro server because the client has fetched a fresh objects from Maestro and re-populate the cache. To workaround it, clients will then have to WATCH for the.ADDED
event as well of the ManifestWork Informer and process. This means, clients having to re-process events that have already been processed. It can be argued that this is the same as the normal k8s informer so this is less of a challenge since the same pattern used in k8s space could be used to solve this.Some integrator have
N
replicas deployed of their component and only1
pod is the leader which watches for status changes while all the N pod can create manifest work. With thestore.List
being crucial to how event processing works in the SDK, new ManifestWork creation which happens on a non leader podwon't have their status changes processed
by the leader pod since the ManifestWork will only be present in the in memory store of the pod that created it.discussion points
How do we envision to solve the above challenges?
How do we avoid memory issue when processing status changes most notably in cases of many events to process. I also noticed that the source client seems to be performing a list every time and loads the resource in memory. Looking at the piece of code, the intent is to perform a
GET(obj.ID)
, is this something that can be done instead?How can a client avoid a storm of events when
Resyncing
for missed events? ..Resync
currents sends the status hash, can we consider an alternative implementation based on sequenceID which might be less expensive to track than status object? The same pattern used in k8s can be used to solve this.Is the List needed and why is it needed?
@machi1990 to confirm if re-processing
.ADDED
event could be an issue at the moment in their component@qiujian16 @skeeey This is a placeholder issue following our discussion this morning. Please feel free to edit it with anything I've missed and I'll go through it again myself later on to review it and update.
The text was updated successfully, but these errors were encountered: