Fix deadlock during shutdown which prevented leader election cleanup #1688

cfryanr · 2023-09-20T23:53:09Z

Before this fix, the deadlock would prevent the leader pod from giving up its lease, which would make it take several minutes for new pods to be allowed to elect a new leader. During that time, no Pinniped controllers could write to the Kube API, so important resources were not being updated during that window. It would also make pod shutdown take about 1 minute.

After this fix, the leader gives up its lease immediately, and pod shutdown takes about 1 second. This improves restart/upgrade time and also fixes the problem where there was no leader for several minutes after a restart/upgrade.

The deadlock was between the post-start hook and the pre-shutdown hook. The pre-shutdown hook blocked until a certain background goroutine in the post-start hook finished, but that goroutine could not finish until the pre-shutdown hook finished. Thus, they were both blocked, waiting for each other infinitely. Eventually the process would be externally killed.

This deadlock was most likely introduced by some change in Kube's generic api server package related to how the many complex channels used during server shutdown interact with each other, and was not noticed when we upgraded to the version which introduced the change.

The bug first appears in Pinniped v0.18.0, in which the Kube libraries were updated from 0.23.6 to 0.24.1. The relevant code in Pinniped had no changes around the time of that release. However, the kube library changed to wait for the pre-shutdown hooks to finish before continuing the remainder of the shutdown process (see the file genericapiserver.go in this diff) thus creating this deadlock.

Release note:

Fix a bug introduced in v0.18.0 which slowed down the shutdown of the Pinniped pods and prevented
the leader pod from releasing its lease, which caused it take take several minutes before replacement
Pinniped pods could regain the lease and become fully operational.

Before this fix, the deadlock would prevent the leader pod from giving up its lease, which would make it take several minutes for new pods to be allowed to elect a new leader. During that time, no Pinniped controllers could write to the Kube API, so important resources were not being updated during that window. It would also make pod shutdown take about 1 minute. After this fix, the leader gives up its lease immediately, and pod shutdown takes about 1 second. This improves restart/upgrade time and also fixes the problem where there was no leader for several minutes after a restart/upgrade. The deadlock was between the post-start hook and the pre-shutdown hook. The pre-shutdown hook blocked until a certain background goroutine in the post-start hook finished, but that goroutine could not finish until the pre-shutdown hook finished. Thus, they were both blocked, waiting for each other infinitely. Eventually the process would be externally killed. This deadlock was most likely introduced by some change in Kube's generic api server package related to how the many complex channels used during server shutdown interact with each other, and was not noticed when we upgraded to the version which introduced the change.

cfryanr · 2023-09-21T00:06:00Z

internal/concierge/apiserver/apiserver.go

-			go func() {
-				defer cancel()
-
-				<-postStartContext.StopCh


This is where the deadlock was happening. This StopCh channel is not going to be closed until our pre-shutdown hook finishes running. But our pre-shutdown hook calls shutdown.Wait() which is effectively waiting for this goroutine to end (because this goroutine cancels the context which allows the runControllers() call to stop blocking, which in turn ends the WaitGroup that the pre-shutdown hook is waiting for).

codecov · 2023-09-21T00:08:48Z

Codecov Report

Merging #1688 (5e06c6d) into main (1ac8691) will decrease coverage by 0.06%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #1688      +/-   ##
==========================================
- Coverage   79.21%   79.15%   -0.06%     
==========================================
  Files         163      163              
  Lines       15758    15769      +11     
==========================================
  Hits        12482    12482              
- Misses       2961     2972      +11     
  Partials      315      315

Files	Coverage Δ
internal/leaderelection/leaderelection.go	`46.66% <0.00%> (-0.35%)`	⬇️
internal/controllerlib/controller.go	`12.60% <0.00%> (-0.33%)`	⬇️
internal/concierge/server/server.go	`17.17% <0.00%> (-0.63%)`	⬇️

joshuatcasey · 2023-09-21T13:58:02Z

Nice find!

internal/concierge/apiserver/apiserver.go

benjaminapetersen

LGTM

cfryanr · 2023-09-22T17:11:46Z

test/integration/pod_shutdown_test.go

+	// Skip tailing pod logs for test runs that are using alternate group suffixes. There seems to be a bug in our
+	// kubeclient package which causes an "unable to find resp serialier" (sic) error for pod log API responses when
+	// the middleware is active. Since we do not tail pod logs in production code (or anywhere else at this time),
+	// we don't need to fix that bug right now just for this test.
+	if env.APIGroupSuffix == "pinniped.dev" {


This new integration test failed for CI test jobs which use an alternate API group suffix. It looks like there is a bug in our kubeclient package for tailing logs when the middleware is active. For now, I just worked around it by only tailing logs in this test when the API group suffix is equal to the default.

cfryanr · 2023-09-25T16:55:13Z

test/integration/pod_shutdown_test.go

+	// Only run this test in CI on Kind clusters, because something about restarting the pods
+	// in this test breaks the "kubectl port-forward" commands that we are using in CI for
+	// AKS, EKS, and GKE clusters. The Go code that we wrote for graceful pod shutdown should
+	// not be sensitive to which distribution it runs on, so running this test only on Kind
+	// should give us sufficient coverage for what we are trying to test here.


I also had to work around an unexpected problem in CI where restarting the pods somehow breaks the kubectl port-forward that we use in CI during AKS/EKS/GKE integration testing.

vmwclabot added the cla-not-required label Sep 20, 2023

cfryanr force-pushed the fix_shutdown_deadlock branch from 368b1e8 to ca6c29e Compare September 20, 2023 23:58

cfryanr commented Sep 21, 2023

View reviewed changes

cfryanr marked this pull request as draft September 21, 2023 00:07

benjaminapetersen reviewed Sep 21, 2023

View reviewed changes

internal/concierge/apiserver/apiserver.go Show resolved Hide resolved

benjaminapetersen approved these changes Sep 21, 2023

View reviewed changes

cfryanr force-pushed the fix_shutdown_deadlock branch from 843189f to 389adfa Compare September 22, 2023 17:04

cfryanr commented Sep 22, 2023

View reviewed changes

cfryanr changed the title ~~WIP: Fix deadlock during shutdown which prevented leader election cleanup~~ Fix deadlock during shutdown which prevented leader election cleanup Sep 22, 2023

cfryanr marked this pull request as ready for review September 22, 2023 17:12

cfryanr enabled auto-merge September 22, 2023 17:12

add integration test for graceful shutdowns which release leader leases

5e06c6d

cfryanr force-pushed the fix_shutdown_deadlock branch from 389adfa to 5e06c6d Compare September 25, 2023 16:51

cfryanr commented Sep 25, 2023

View reviewed changes

cfryanr merged commit 58c5146 into main Sep 25, 2023
8 checks passed

cfryanr deleted the fix_shutdown_deadlock branch September 25, 2023 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock during shutdown which prevented leader election cleanup #1688

Fix deadlock during shutdown which prevented leader election cleanup #1688

cfryanr commented Sep 20, 2023 •

edited

Loading

cfryanr Sep 21, 2023 •

edited

Loading

benjaminapetersen Sep 21, 2023

codecov bot commented Sep 21, 2023 •

edited

Loading

joshuatcasey commented Sep 21, 2023

benjaminapetersen left a comment

cfryanr Sep 22, 2023

cfryanr Sep 25, 2023

Fix deadlock during shutdown which prevented leader election cleanup #1688

Fix deadlock during shutdown which prevented leader election cleanup #1688

Conversation

cfryanr commented Sep 20, 2023 • edited Loading

cfryanr Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

benjaminapetersen Sep 21, 2023

Choose a reason for hiding this comment

codecov bot commented Sep 21, 2023 • edited Loading

Codecov Report

joshuatcasey commented Sep 21, 2023

benjaminapetersen left a comment

Choose a reason for hiding this comment

cfryanr Sep 22, 2023

Choose a reason for hiding this comment

cfryanr Sep 25, 2023

Choose a reason for hiding this comment

cfryanr commented Sep 20, 2023 •

edited

Loading

cfryanr Sep 21, 2023 •

edited

Loading

codecov bot commented Sep 21, 2023 •

edited

Loading