Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock during shutdown which prevented leader election cleanup #1688

Merged
merged 2 commits into from
Sep 25, 2023

Conversation

cfryanr
Copy link
Member

@cfryanr cfryanr commented Sep 20, 2023

Before this fix, the deadlock would prevent the leader pod from giving up its lease, which would make it take several minutes for new pods to be allowed to elect a new leader. During that time, no Pinniped controllers could write to the Kube API, so important resources were not being updated during that window. It would also make pod shutdown take about 1 minute.

After this fix, the leader gives up its lease immediately, and pod shutdown takes about 1 second. This improves restart/upgrade time and also fixes the problem where there was no leader for several minutes after a restart/upgrade.

The deadlock was between the post-start hook and the pre-shutdown hook. The pre-shutdown hook blocked until a certain background goroutine in the post-start hook finished, but that goroutine could not finish until the pre-shutdown hook finished. Thus, they were both blocked, waiting for each other infinitely. Eventually the process would be externally killed.

This deadlock was most likely introduced by some change in Kube's generic api server package related to how the many complex channels used during server shutdown interact with each other, and was not noticed when we upgraded to the version which introduced the change.

The bug first appears in Pinniped v0.18.0, in which the Kube libraries were updated from 0.23.6 to 0.24.1. The relevant code in Pinniped had no changes around the time of that release. However, the kube library changed to wait for the pre-shutdown hooks to finish before continuing the remainder of the shutdown process (see the file genericapiserver.go in this diff) thus creating this deadlock.

Release note:

Fix a bug introduced in v0.18.0 which slowed down the shutdown of the Pinniped pods and prevented
the leader pod from releasing its lease, which caused it take take several minutes before replacement
Pinniped pods could regain the lease and become fully operational.

Before this fix, the deadlock would prevent the leader pod from giving
up its lease, which would make it take several minutes for new pods to
be allowed to elect a new leader. During that time, no Pinniped
controllers could write to the Kube API, so important resources were not
being updated during that window. It would also make pod shutdown take
about 1 minute.

After this fix, the leader gives up its lease immediately, and pod
shutdown takes about 1 second. This improves restart/upgrade time and
also fixes the problem where there was no leader for several minutes
after a restart/upgrade.

The deadlock was between the post-start hook and the pre-shutdown hook.
The pre-shutdown hook blocked until a certain background goroutine in
the post-start hook finished, but that goroutine could not finish until
the pre-shutdown hook finished. Thus, they were both blocked, waiting
for each other infinitely. Eventually the process would be externally
killed.

This deadlock was most likely introduced by some change in Kube's
generic api server package related to how the many complex channels used
during server shutdown interact with each other, and was not noticed
when we upgraded to the version which introduced the change.
@cfryanr cfryanr force-pushed the fix_shutdown_deadlock branch from 368b1e8 to ca6c29e Compare September 20, 2023 23:58
go func() {
defer cancel()

<-postStartContext.StopCh
Copy link
Member Author

@cfryanr cfryanr Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the deadlock was happening. This StopCh channel is not going to be closed until our pre-shutdown hook finishes running. But our pre-shutdown hook calls shutdown.Wait() which is effectively waiting for this goroutine to end (because this goroutine cancels the context which allows the runControllers() call to stop blocking, which in turn ends the WaitGroup that the pre-shutdown hook is waiting for).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

@cfryanr cfryanr marked this pull request as draft September 21, 2023 00:07
@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #1688 (5e06c6d) into main (1ac8691) will decrease coverage by 0.06%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #1688      +/-   ##
==========================================
- Coverage   79.21%   79.15%   -0.06%     
==========================================
  Files         163      163              
  Lines       15758    15769      +11     
==========================================
  Hits        12482    12482              
- Misses       2961     2972      +11     
  Partials      315      315              
Files Coverage Δ
internal/leaderelection/leaderelection.go 46.66% <0.00%> (-0.35%) ⬇️
internal/controllerlib/controller.go 12.60% <0.00%> (-0.33%) ⬇️
internal/concierge/server/server.go 17.17% <0.00%> (-0.63%) ⬇️

@joshuatcasey
Copy link
Member

Nice find!

Copy link
Member

@benjaminapetersen benjaminapetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cfryanr cfryanr force-pushed the fix_shutdown_deadlock branch from 843189f to 389adfa Compare September 22, 2023 17:04
Comment on lines +50 to +60
// Skip tailing pod logs for test runs that are using alternate group suffixes. There seems to be a bug in our
// kubeclient package which causes an "unable to find resp serialier" (sic) error for pod log API responses when
// the middleware is active. Since we do not tail pod logs in production code (or anywhere else at this time),
// we don't need to fix that bug right now just for this test.
if env.APIGroupSuffix == "pinniped.dev" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new integration test failed for CI test jobs which use an alternate API group suffix. It looks like there is a bug in our kubeclient package for tailing logs when the middleware is active. For now, I just worked around it by only tailing logs in this test when the API group suffix is equal to the default.

@cfryanr cfryanr changed the title WIP: Fix deadlock during shutdown which prevented leader election cleanup Fix deadlock during shutdown which prevented leader election cleanup Sep 22, 2023
@cfryanr cfryanr marked this pull request as ready for review September 22, 2023 17:12
@cfryanr cfryanr enabled auto-merge September 22, 2023 17:12
@cfryanr cfryanr force-pushed the fix_shutdown_deadlock branch from 389adfa to 5e06c6d Compare September 25, 2023 16:51
Comment on lines +27 to +31
// Only run this test in CI on Kind clusters, because something about restarting the pods
// in this test breaks the "kubectl port-forward" commands that we are using in CI for
// AKS, EKS, and GKE clusters. The Go code that we wrote for graceful pod shutdown should
// not be sensitive to which distribution it runs on, so running this test only on Kind
// should give us sufficient coverage for what we are trying to test here.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had to work around an unexpected problem in CI where restarting the pods somehow breaks the kubectl port-forward that we use in CI during AKS/EKS/GKE integration testing.

@cfryanr cfryanr merged commit 58c5146 into main Sep 25, 2023
8 checks passed
@cfryanr cfryanr deleted the fix_shutdown_deadlock branch September 25, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants