-
Notifications
You must be signed in to change notification settings - Fork 68
Error while ensuring backend services #216
Comments
This is interesting:
Looks like the CLI creates the backend service successfully but then receives a 404 when it tries to fetch it later. Can you check if the backend service
|
Nope. I tried again today. The backends should be showing at: Note: we are on legacy networks, not on VPC networks. Part of creating this MCI is to migrate to VPC networks without downtime. |
Yes that is correct, they should be showing up in the UI. Have you tried the same in a different project which is not using legacy networks and see if creating a multicluster ingress works fine. That will confirm that the YAMLs are correct and it is actually some project specific setting that is the problem here. |
I did not have time to try that. I want to try next week. However, I feel that there is something that we can improve in kubemci already here: kubemci reports that the backend is created, but it is not. Isn’t that check wrong then? Even if legacy networks is the culprit, the check is not working. If the check would be working it might be able to show the correct (helpful) error message. |
Yes. I will wait for us to find a root cause to be sure. At a cursory glance, the code looks correct to me: k8s-multicluster-ingress/app/kubemci/pkg/gcp/backendservice/backendservicesyncer.go Line 222 in 415afa6
|
When I run kubemci in the same project with 2 VPC-network clusters it works. When I include the original legacy cluster it fails. With the same sequence "Backend service created successfully; Not found". It actually is not there, when checking with I wanted to dive into the bowels of kubemci to debug, but I was unable to install kubemci from source. Are there any quick guidelines for running this go app from source? It seems that the makefile is referencing a different directory than the files are actually in. And the package name of |
Ah, I found out. Apparently I had no GOPATH (using go 11.5 with modules mainly)...
|
Now the build fails:
|
Fixed above build error by switching to Go Modules (#219). |
The reason the error during the backendservice creation is not detected is due to a bug in op.go which is resolved in the latest version, compare function below with the link above: https://github.com/GoogleCloudPlatform/k8s-cloud-provider/blob/master/pkg/cloud/op.go#L83-L90 I think we should update to a more up to date version of kubernetes/kubernetes, but that might hurt a bit... |
Finally, I was able to extract the underlying GlobalOperation error:
This is the exact same as what I see when trying to create a Load Balancer from the console: in that case the option for the Rate-based balancing mode is greyed out for this (old cluster) instance group. In the console it is not a problem to pick a different balancing mode (Utilization) for this cluster and the 'normal' Rate based balancing mode for new clusters. However, |
Given a compute command like this: gcloud compute backend-services list --flatten="backends[]" --format "csv(name,backends.group,backends.balancingMode)" I can easily list the current backend-services and which instance groups they are using with which balancing mode. Given that list, it would be trivial to pick the 'correct' (eg. already used) balancing mode. |
This also means that reproduction is much easier to accomplish than I thought: just create 2 instance groups and add 1 backend-service with Utilization to one of those, then try to setup kubemci for them. That should fail too. |
Thanks a lot Herman for this great debugging! I understand this issue now. Looks like you have 2 different backend services with different load balancing modes, pointing to the same instance group. This is not allowed. All backend services pointing to the same instance group should have the same balancing mode. Most users do not run into this problem since the in-cluster k8s ingress-gce controller and kubemci CLI both create backend services with the same balancing mode (RATE). Since these 2 are mostly the only ones creating backend services, there is never a conflict. In your case, for some reason the in-cluster k8s ingress-gce controller is creating backend services with a different load balancing mode (UTILIZATION) than kubemci (RATE). Why could this be happening and ways to fix:
I understand that deleting all existing ingresses might not be a viable solution if this is a production cluster, but thats what we have right now. Hope this helps |
Yes, that is how I understand it as well. Removing the ingresses is not viable as we need this in our production cluster too. We are utilizing kubemci as part of our cluster migration strategy 😉 so áfterwards we can drop the whole cluster, but for now we need a workaround. I want to take it one step further by making the balancing mode a parameter in (a fork of) kubemci. And possibly even be smart about it and detect the reuse of an instance group and reuse its current balancing mode. Like I said |
We are not accepting any patches in kubemci right now, but am happy to point people to your fork if they run into this issue. |
I see. Well I was working with our support engineer and he recommended kubemci: that is why I'm checking it out. Also, you have quite the foundation in place already, so it would be a waste to re-engineer this. Meanwhile, I think I'm just gonna continue the fork. Do you have any timeline on when this kind of functionality lands in GKE itself as mentioned in the docs?
|
I also created some PRs, for people to find the functionality, and have a place for input. You probably have a lot of insight in the choices made for this tool. |
When running kubemci I consistently get issues with the backend services:
The output is:
It complains about urlMap issues too, while the ingress.yaml has no complex urlMap:
The text was updated successfully, but these errors were encountered: