-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kourier at large scale #941
Comments
This issue is stale because it has been open for 90 days with no |
/lifecycle frozen |
Hello 👋, We have noticed this kind of inconsistent deploy times on our Knative clusters too. As of now, this is the main reason we're running multiple clusters, with every cluster having only between 400 and 500 ksvc. Above, we start seeing some slowness, mostly ksvc taking a while to be ready. I've started investigating based on @daraghlowe's example, on a simple kind cluster, measuring the time for the
For the first ingresses (up to 1200), 95% of the time, it takes less than 1s for every ingress to become ready. But, when we have more ingresses, this time increases, up to 2 seconds. The more ingresses objects we have, the more time it takes for an ingress to be ready, but it's always less or equal 2 seconds, so it's not that bad. See this plot showing the percentage of ingresses creation taking between 1 and 2 seconds according to the number of total ingresses: These results might be normal, I don't know the intricacies. But, once I had 2000 ingresses, and after deleting an existing ingress (3rd step), the next ingress creation (4th step) took between 7 and 8 seconds. The next ones were consistent with the results I showed before, between 1 and 2 seconds. I'm still not sure why we got that big "time-to-ready" duration increase (from 1-2s to 7-8s) just after deleting an ingress, but from an outside perspective, adding an ingress should always take the same time to be ready. I could not reproduce @daraghlowe numbers, because I only focused on ingresses here; there are obviously other things configured when we create a ksvc (revision, configuration, etc.). I'll continue to investigate, but I thought it was worth posting this first experiment as it could bring more discussions. |
What's the issue?
We have started testing Kourier at large scale to see if deployment times are better than Istio(time for a KSVC to become ready to serve traffic). Deploy times are good with Kourier and it consistently takes less than 10 seconds for a newly added KSVC to become ready all the way up to 2000 KSVC.
However, if you delete a KSVC and then you try to add a new KSVC, times are much slower and even with only 500 KSVC on the cluster it takes several minutes before the new KSVC is ready.
Looking at the logs in the net-kourier-controller, you can see that it starts reconciling all of the Ingress on the cluster when you delete a KSVC and presumably this needs to finish before the new ingress can be created for our new KSVC.
Why is this a problem?
This leads to inconsistent deploy times for our workloads which creates an inconsistent user experience as sometimes its really quick and other times it could takes minutes to become ready.
Results
Here are the times it took for a single KSVC to become ready right after I deleted a different single KVSC alongside the number of KSVC that were on the cluster.
Why are we doing this?
We are running a cluster with Knative and Istio with 1500 KSVC and have started to run a problem with the time it's taking before new KSVC we add become ready (the ingress).
We opened an issue for this here: knative/serving#13247
The text was updated successfully, but these errors were encountered: