Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #3146 Support close open/fail for Ready Tracker & surface errors swallowed by grp.Wait() #3308

Merged

Conversation

David-Jaeyoon-Lee
Copy link
Contributor

@David-Jaeyoon-Lee David-Jaeyoon-Lee commented Mar 14, 2024

What this PR does / why we need it:
This change fixes Run() method in Ready Tracker to have a fail-close option so that if we fail to fetch expectations we surface those errors & fail. By default, this won't be enabled & we will fail-open.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #3146

Special notes for your reviewer: Refer to issue link for more information

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from 0b9e1f4 to 96644d8 Compare March 14, 2024 20:39
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from 25e891e to 2c5000d Compare March 28, 2024 19:39
@David-Jaeyoon-Lee David-Jaeyoon-Lee marked this pull request as ready for review March 28, 2024 19:40
@David-Jaeyoon-Lee David-Jaeyoon-Lee requested a review from a team as a code owner March 28, 2024 19:40
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Added a couple comments.

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from e1fa4a6 to cbfcfc8 Compare April 2, 2024 00:00
@David-Jaeyoon-Lee David-Jaeyoon-Lee changed the title fix: #3146 Surface error swallowed by grp.Wait() & replace v1beta references for constraint templates in Ready Tracker fix: #3146 Support close open/fail for Ready Tracker & surface errors swallowed by grp.Wait() Apr 8, 2024
@David-Jaeyoon-Lee
Copy link
Contributor Author

Separating out constraint template v1beta1 -> v1 changes in separate PR.

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 3 times, most recently from 77559ff to 7c5e5a0 Compare April 8, 2024 17:40
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after nit, thank you!

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 10, 2024

Codecov Report

Attention: Patch coverage is 87.83784% with 36 lines in your changes missing coverage. Please review.

Project coverage is 47.83%. Comparing base (3350319) to head (bf49b3c).
Report is 98 commits behind head on master.

Files Patch % Lines
pkg/syncutil/concurrent_slice.go 0.00% 19 Missing ⚠️
pkg/readiness/ready_tracker.go 95.01% 11 Missing and 2 partials ⚠️
pkg/syncutil/single_runner.go 71.42% 4 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (3350319) and HEAD (bf49b3c). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (3350319) HEAD (bf49b3c)
unittests 2 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3308      +/-   ##
==========================================
- Coverage   54.49%   47.83%   -6.67%     
==========================================
  Files         134      219      +85     
  Lines       12329    14884    +2555     
==========================================
+ Hits         6719     7120     +401     
- Misses       5116     6955    +1839     
- Partials      494      809     +315     
Flag Coverage Δ
unittests 47.83% <87.83%> (-6.67%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from 1705b33 to aa2b049 Compare April 10, 2024 20:14
@ritazh
Copy link
Member

ritazh commented Apr 13, 2024

Thanks for the PR @David-Jaeyoon-Lee
Few nits
I did not see unit test added for the new flag.

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 3 times, most recently from ab0fdbb to bf1ac93 Compare April 23, 2024 21:18
@@ -45,7 +45,7 @@ import (
logf "sigs.k8s.io/controller-runtime/pkg/log"
)

var crashOnFailureFetchingExpectations = flag.Bool("crash-on-failure-fetching-expectations", false, "When set (defaults to false), gatekeeper will ignore errors that occur when gathering expectations. This prevents bootstrapping errors from crashing Gatekeeper at the cost of increasing the risk Gatekeeper will under-enforce policy by serving before it has loaded in all policies. Enabling this will help prevent under-enforcement at the risk of crashing during startup for issues like network errors.")
var crashOnFailureFetchingExpectations = flag.Bool("crash-on-failure-fetching-expectations", false, "When set (defaults to false), gatekeeper will ignore errors that occur when gathering expectations. This prevents bootstrapping errors from crashing Gatekeeper at the cost of increasing the risk Gatekeeper will under-enforce policy by serving before it has loaded in all policies. Enabling this will help prevent under-enforcement at the risk of crashing during startup for issues like network errors. Note that enabling this flag currently does not achieve the aforementioned effect since fetching expectations are set to retry until success so failures during fetching expectations currently do not occurr.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxsmythe Just wanted to double check this wording

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"When set" -> "unless set"

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from fc45d15 to 1b69eac Compare April 30, 2024 19:08
@David-Jaeyoon-Lee
Copy link
Contributor Author

David-Jaeyoon-Lee commented Apr 30, 2024

I still need to fix lint errors for the Error return value is not checked for the goroutine deleted object polling. Any ideas on how to fix this? @maxsmythe

@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from ef710df to f6ee82e Compare May 2, 2024 20:32
@David-Jaeyoon-Lee David-Jaeyoon-Lee force-pushed the davjlee/refactor/ready-tracker branch 2 times, most recently from bbf995a to 6df8c51 Compare July 1, 2024 21:53
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

Thank you for figuring this out!

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
@@ -46,6 +46,8 @@ import (

var log = logf.Log.WithName("readiness-tracker")

var crashOnFailureFetchingExpectations = flag.Bool("crash-on-failure-fetching-expectations", false, "Unless set (defaults to false), gatekeeper will ignore errors that occur when gathering expectations. This prevents bootstrapping errors from crashing Gatekeeper at the cost of increasing the risk Gatekeeper will under-enforce policy by serving before it has loaded in all policies. Enabling this will help prevent under-enforcement at the risk of crashing during startup for issues like network errors. Note that enabling this flag currently does not achieve the aforementioned effect since fetching expectations are set to retry until success so failures during fetching expectations currently do not occur.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this can be shortened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the naming or the description or both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description

pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
pkg/readiness/ready_tracker.go Outdated Show resolved Hide resolved
Add fail-close support for fetching expectations by adding error handling & surfacing the errors being swallowed by grp.Wait().

Signed-off-by: David-Jaeyoon-Lee <[email protected]>
Remove the starting hyphen from the
crash-on-failure-fetching-expectations flag

Signed-off-by: David-Jaeyoon-Lee <[email protected]>
and add unit tests
Fix comments and return error in trackConfigAndSyncSets. Add unit tests
for each track method & each wait in Run method.

Signed-off-by: David-Jaeyoon-Lee <[email protected]>
Change errgroup logic such that we instead use waitgroups and errchannels.
This is that we remove the idea of shared fate between each
tracking/subtracking threads. Removes complexity of figuring out what the
errgroups are doing as well as complicated logic with shared fate between
tracking threads and singlerunners

Signed-off-by: David-Jaeyoon-Lee <[email protected]>
@ritazh ritazh added this to the v3.17.0 milestone Jul 3, 2024
@maxsmythe
Copy link
Contributor

@David-Jaeyoon-Lee Unit tests are failing

Copy link
Contributor

@JaydipGabani JaydipGabani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM. @David-Jaeyoon-Lee unit tests are failing.

@JaydipGabani
Copy link
Contributor

Unit test passed, re-running to check for flakiness.

@JaydipGabani
Copy link
Contributor

JaydipGabani commented Jul 18, 2024

Running once more to check.

Copy link
Member

@ritazh ritazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you for the PR!

@ritazh ritazh merged commit 2244cee into open-policy-agent:master Jul 19, 2024
19 checks passed
Ankurk99 pushed a commit to Ankurk99/gatekeeper that referenced this pull request Aug 1, 2024
… & surface errors swallowed by grp.Wait() (open-policy-agent#3308)

Signed-off-by: David-Jaeyoon-Lee <[email protected]>
Co-authored-by: Sertaç Özercan <[email protected]>
Co-authored-by: Jaydipkumar Arvindbhai Gabani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

some ready tracker refactor
6 participants