-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow to restrict the CRs watched according to their labels #1832
base: master
Are you sure you want to change the base?
Conversation
673b010
to
a735528
Compare
a735528
to
a7d228c
Compare
Thanks for the PR! The approach looks good to me. @Baarsgaard since you took a look at the caching logic not too long ago, does this also seem good to you? The only thing I might change is to use comma separated values for the label selector instead of JSON as it complicates quoting quite a bit. Do you think something like |
Thank you for your review @theSuess, I've simplified the implementation of the code, it should be more generic as I'm now using WDYT? |
Signed-off-by: Wilfried Roset <[email protected]>
43ae851
to
70c948d
Compare
This is great! I'll test this locally and see if I can come up with an E2E test to make sure this keeps working when refactoring the caching logic. |
@@ -120,6 +126,18 @@ func main() { | |||
PprofBindAddress: pprofAddr, | |||
} | |||
|
|||
var labelSelectors labels.Selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block triggers an error from the linter
main.go:89:1: cyclomatic complexity 26 of func
main
is high (> 25) (gocyclo)
I could move this block of code in a dedicated func outside of the main.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I think it's more readable when contained in the main function. If the refactor is easy enough then go for it, otherwise I'm good with ignoring the linter here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it like that, I would have moved the code outside only to please the linter. I find the code readable enough.
…efinition Signed-off-by: Wilfried Roset <[email protected]>
70c948d
to
b06e4da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var labelSelectors labels.Selector | ||
var err error | ||
if watchLabelSelectors != "" { | ||
labelSelectors, err = labels.Parse(watchLabelSelectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I like this quite a lot, and would prefer to swap the existing namespace selector parsing to this or the ValidatedSelectorFromSet
to maintain current behaviour.
To properly support sharding at scale, changing to labels.Parse
might be better for the namespace selector
@@ -180,6 +198,7 @@ func main() { | |||
|
|||
case watchNamespace == "" && watchNamespaceSelector == "": | |||
// cluster scoped | |||
controllerOptions.Cache.DefaultLabelSelector = labelSelectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will unfortunately break the operator in several ways:
- Setting a default
LabelSelector
would breakvaluesFrom
referencing unlabelledConfigMaps
andSecrets
, the workaround is the same cache ignore rule that is necessary in Fix: Do not cache native resources created without CommonLabels #1818 - Any resource created by the Operator itself, specifically related to internal
Grafana
CRs would be created and maybe be in the cache for a while, but any updates or a restart and they would be orphaned as labels are not inherited, see fix: add common labels to created resources #1661 - I'm not 100% sure on this, but even if you fix 2. If you create an Internal
Grafana
and migrate it to another shard, it would have to be with a full delete and re-apply to ensure the backingDeployment, PVC, Route, ConfigMap
and such are not orphaned, losing all alert history and more if no remote database is configured.
This could potentially be fixed by cascading label updates,but would require passing down the old and new labels to all the supporting reconcilers. - I think
Cache
is nil at this point 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #1818 most of this should be fine if we either
- only apply the watch label selector to grafana CRDs
- or somehow merge it with
"app.kubernetes.io/managed-by": "grafana-operator"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only applying the watchLabels on Grafana CRDs will hide them due to the defaultLabelSelector unless overwritten with byObject
, and you will end up with multiple reconciles of the same resources from different shards when omitted.
Merging is an option, but does not solve the shard migration/cascading label updates to ensure nothing is orphaned.
With the above said, I have a ton of questions related to the Slack comment If startup/performance is a concern:
Sharding:
Potential blockers/worries
I do not have a great understanding of compute at that scale, but you could spin up a lot of resources quickly with the below and stress the operator:
kubectl port-forward -n grafana-operator-system deploy/grafana-operator-controller-manager-v5 8888 &
go tool pprof -pdf http://localhost:8888/debug/pprof/heap > heap.pdf
go tool pprof -pdf http://localhost:8888/debug/pprof/goroutine > goroutine.pdf If a lot of these resources are provided by a central team and not by consumers, is there features you'd like to see that could reduce the total number or CRs? PS: It's not everyday you get work on stuff at this scale, so I am interested/slightly invested in this already. |
thank you @Baarsgaard for taking the time to thoroughly review my PR. Here is more context
Indeed, I will need to adjust this setting along the growth of the number of CRs
I've a proof-of-concept that allow to CRUD the Grafana CRs via API call. the underlying api create the CRs directly in k8s.
I will have to work more on my proof-of-concept to find the correct value. I'm also investigating the double shard:
My control plane will be responsible to equally deploy the CRs where there is room (on less crowded cluster && less crowded operator shard)
I do not plan to migrate resources between shard.
It is simpler to have everything in the same namespace from a provisioning and operation point of view.
I have full control on the name of each CRs, my API is responsible for crafting the correct name (e.g: uuid compact)
This is ok on my end. more over the double sharding should ease the amount of work done by a single operator.
See my comment about the double sharding. The scale I'm aiming is 500k CRs but I'm considering spliting the workload in tens of k8s clusters. Example, I've 50 clusters, each hosting 10k CRs with each 10 shards. Then each operator shard will only be responsible for 1k CRs. However this is true that kubectl get grafana-operator --all or similar large requests will be heavy on the operators. I don't expect to do that often and I can always do 20 shards or 100 shards.
I'm progressing on my proof-of-concept with a quick and dirty python script like so from kubernetes import client, config, utils
import yaml
import logging
import argparse
def cmdline_parser():
parser = argparse.ArgumentParser()
parser.add_argument(
"-d",
"--debug",
help="Print lots of debugging statements",
action="store_const",
dest="loglevel",
const=logging.DEBUG,
default=logging.WARNING,
) # mind the default value
parser.add_argument(
"-v",
"--verbose",
help="Be verbose",
action="store_const",
dest="loglevel",
const=logging.INFO,
)
parser.add_argument(
"-q",
"--quiet",
help="Be quiet",
action="store_const",
dest="loglevel",
const=logging.CRITICAL,
)
parser.add_argument(
"-n",
"--namespace",
help="namespace where to deploy the CR",
default="test-grafana-operator",
)
parser.add_argument(
"-c", "--count", type=int, help="number of CR to deploy", default=1
)
args = parser.parse_args()
logging.basicConfig(level=args.loglevel)
return args
def main():
args = cmdline_parser()
config.load_kube_config()
api = client.CustomObjectsApi()
body = None
with open("./grafana.yaml") as fd:
body = yaml.safe_load(fd)
for i in range(args.count):
body["metadata"]["name"] = f"grafana-{i}"
logging.debug(f"creating grafna {i}")
try:
api.create_namespaced_custom_object(
group="grafana.integreatly.org",
version="v1beta1",
namespace=args.namespace,
plural="grafanas",
body=body,
)
except client.ApiException as e:
if e.reason == "Conflict":
api.patch_namespaced_custom_object(
name=body["metadata"]["name"],
group="grafana.integreatly.org",
version="v1beta1",
namespace=args.namespace,
plural="grafanas",
body=body,
)
else:
raise
if __name__ == "__main__":
main() With the following CR apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
name: grafana-1
namespace: test-grafana-operator
spec:
deployment:
spec:
template:
spec:
containers:
- image: docker.io/grafana/grafana:11.0.0
name: grafana
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 20m
memory: 100Mi I will report what I find out with pprof when I get to a confortable spot. I could spawn a pyroscope to ease the work 😇
The current state of the operator seems to suit my usease. Upon request for a Grafana I create a new CR in the right place (right cluster, right shard). I was more or less straightforward to implement my API and test it. |
The context could be two operator and two grafana CR, the first one with the label
shard: 1
and the second oneshard:2
. I should I configure the operator 1 to watch only the CR with the label shard: 1 and the operator 2 watch shard: 2This example might seems simple but I'm aiming for a bigger scale with tens of shards with each shard would host tens of thousands CR. I'm anticipating the eventual heavy workload that a single operator will need to handle with the according to the amount of CRs I will create. I'm aiming for tens of thousands CRs. So having that magnitude handle by a single operator might be hard especially when the operator restarts.
Sharding the operators would help lower the amount of CRs tracked by a single one.
edit: this is a follow up PR after a discussion on slack --> https://kubernetes.slack.com/archives/C019A1KTYKC/p1737370451358769