Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero downtime upgrade without race conditions #6105

Closed
calvinbui opened this issue Aug 31, 2020 · 5 comments
Closed

Zero downtime upgrade without race conditions #6105

calvinbui opened this issue Aug 31, 2020 · 5 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@calvinbui
Copy link
Contributor

I'm following the guide on migrating helm stable/nginx-ingress to kubernetes/ingress-nginx with zero downtime.

I'm having is race condition between the old and new controllers repeatedly updating the existing ingresses. This was bound to happen but it was never mentioned in the steps or its linked guides.

I've tried setting the ingress-class to another value on the new controller (i.e. nginx-new) and this prevents the race condition. ✅

The problem now is if I update the ingress-class annotation to nginx-new on an ingress, the address will change and traffic to the old controller is 404'd as DNS was not updated. This is also expected.

I'm assuming the doc is out of date or not working. What would be the best way to do a zero-downtime deployment when there are race conditions?

My guesses are:

  1. somehow allowing an ingress resource to be served by two different ingress controllers classes (i.e. nginx and nginx-new). The problem here is I don't know if it is possible to serve one ingress with two controllers.
  2. add the additional address to DNS to point to new controller
  3. remove old DNS entry after new address has propagated
  4. remove old controller.

Or alternatively,

  1. create a duplicate ingress resource for ALL ingresses served by the old controller but with new controller set as its ingress-class
  2. do the DNS steps above
  3. remove ingresses that use the old controller
@calvinbui calvinbui added the kind/support Categorizes issue or PR as a support question. label Aug 31, 2020
@calvinbui
Copy link
Contributor Author

I went ahead and did step 2

@laszlocph
Copy link

Thanks for sharing your steps. This stable/nginx-ingress to kubernetes/ingress-nginx move is really not well supported 👎

@azman0101
Copy link

azman0101 commented Nov 2, 2020

Very interesting feedback @calvinbui

The doc state that

Deploying multiple Ingress controllers, of different types (e.g., ingress-nginx & gce), and not specifying a class annotation will result in both or all controllers fighting to satisfy the Ingress, and all of them racing to update Ingress status field in confusing ways

https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/#multiple-ingress-nginx-controllers

There is also a reference to zero-downtime in this doc https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#service-upstream

This can be desirable for things like zero-downtime deployments as it reduces the need to reload NGINX configuration when Pods come up and down. See issue #257.

Is this really possible to migrate without downtime ?

@calvinbui
Copy link
Contributor Author

Is this really possible to migrate without downtime ?

With my experience on EKS, no it's not possible. There's always a minute or so of downtime as it sorts itself out and does leader election etc. This is even with two replicas.

I played around with the pre-stop command from their docs on the Helm Chart (https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx#upgrading-with-zero-downtime-in-production) and still had some the minute or so of downtime.

The only method that works is to do create two ingress classes and flip the DNS. This is a pain of course and I wish I knew how I could do a rolling upgrade instead.

@calvinbui
Copy link
Contributor Author

I have a feeling this could be related to the AWS NLB based on this PR #5855

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants