-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-29729: Refactor security context configuration in pod reconciler #3185
Conversation
/unit unit |
} else { | ||
// If GrpcPodConfig is nil, apply 'restricted' security settings by default | ||
addSecurityContext(pod, runAsUser) // Default to restricted settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this elsewhere, but the implication here is that we will break existing catalog sources that reference catalog images that are not compatible with the securityContext required by the PSA restricted mode.
In general, are these implications well understood and gamed out? Any chance we have a design doc about this change of the default behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think IBM had suggested a change where if the securityContextConfig is unset, we look at the PSA configuration of the namespace. If the namespace has enforce: restricted, then we default to restricted. Otherwise, we keep the default as legacy.
That seems like it would resolve the bug without breaking existing CatalogSources that are running correctly in non-restricted namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that approach makes sense. I will rework this fix to check the namespace PSA config first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we want to check the namespace for its PSA config, we'd make a change something like this with new calling code in the Pod()
function and the new PSA checking function below. But we need to get the namespace info and we don't have the wherewithal in Pod()
currently, specifically an operatorClient
.
Pod()
is called in many places in the codebase, so I am trying not to modify its signature. We could add the needed building blocks globally or is there some better way with controller-runtime I am missing?
// Define default security context config
securityContextConfig := operatorsv1alpha1.Legacy
// Determine the security context configuration based on GrpcPodConfig
if source.Spec.GrpcPodConfig != nil {
if source.Spec.GrpcPodConfig.SecurityContextConfig != "" {
securityContextConfig = source.Spec.GrpcPodConfig.SecurityContextConfig
} else {
// If SecurityContextConfig is unset, check the PSA configuration of the namespace
var err error
securityContextConfig, err = getNamespacePSAConfig(source.Namespace, opClient)
if err != nil {
return nil, fmt.Errorf("failed to get namespace PSA configuration: %v", err)
}
}
} else {
// If GrpcPodConfig is nil, apply legacy security settings by default
securityContextConfig = operatorsv1alpha1.Legacy
}
// Apply the determined security settings to the pod
if securityContextConfig == operatorsv1alpha1.Restricted {
// Apply 'restricted' security settings
addSecurityContext(pod, runAsUser) // This should be a predefined function that sets restricted security context to the pod
} else {
// For 'Legacy' or any other case, clear all security contexts
clearAllSecurityContexts(pod) // This should be a predefined function that removes all security contexts from the pod
}
// getNamespacePSAConfig checks the namespace for the PSA configuration and returns the applicable security config.
func getNamespacePSAConfig(namespace string, client operatorclient.ClientInterface) (operatorsv1alpha1.SecurityConfig, error) {
ns, err := client.KubernetesInterface().CoreV1().Namespaces().Get(context.TODO(), namespace, metav1.GetOptions{})
if err != nil {
return "", fmt.Errorf("error fetching namespace: %v", err)
}
// 'pod-security.kubernetes.io/enforce' is the label used for enforcing namespace level security,
// and 'restricted' is the value indicating a restricted security policy.
if val, exists := ns.Labels["pod-security.kubernetes.io/enforce"]; exists && val == "restricted" {
return operatorsv1alpha1.Restricted, nil
}
return operatorsv1alpha1.Legacy, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still waiting for an update on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @tmshort if you have any idea for best way to get a client avail at that scope to get a namespace to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely avoid making a client call inside the Pod
function, especially since we call it so many times.
Steve ran into issues where the resolver would use clients to fetch data from the cache and/or apiserver. The result was really strange race conditions because the cache and apiserver can change during the course of a single resolution.
This sounds like it could run into similar problems. I think we should find a way to call getNamespacePSAConfig
just once during a reconciliation and then use its result anywhere it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit (f19d68a) adds checking of the namespace PSA restrictions. Currently, I am at about 8 files changed with many changes in each file (git diff is 871 lines). I am close, but still have some failing unit tests.
My basic approach was looking at the call graph and trying to minimize changes. Given that, I am trying to change the two wrapper Pod()
functions in configmap.go
and grpc.go
. These locations already have access to the client and seem like a good place to put a fetch of the namespace object. We're checking if it's cached already to meet the only call once per reconcile loop goal.
I guess my question is, does this seem worth continuing to pursue for this bug fix, hammer out the rest of the unit test problems? Or, try some other approach...
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This change updates the logic for setting security contexts within the OLM pod reconciler. Now, it differentiates between 'Restricted' and 'Legacy' security contexts more explicitly. The 'Restricted' security context applies default security settings unless overridden, while the 'Legacy' context clears all security settings. When no security context is configured, it defaults to restricted. Additionally, the related tests have been updated to reflect these changes and ensure correct behavior. Signed-off-by: btofel <[email protected]>
Signed-off-by: btofel <[email protected]>
Signed-off-by: btofel <[email protected]>
Signed-off-by: btofel <[email protected]> Signed-off-by: Brett Tofel <[email protected]>
Closing in favor of: #3206 |
Description of the change:
This change updates the logic for setting security contexts within the OLM pod reconciler. Now, it differentiates between 'Restricted' and 'Legacy' security contexts more explicitly. The 'Restricted' security context applies default security settings unless overridden, while the 'Legacy' context clears all security settings. When no security context is configured, it defaults to restricted. Additionally, the related tests have been updated to reflect these changes and ensure correct behavior.
Motivation for the change:
OCPBUGS-29729
Architectural changes:
None
Testing remarks:
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue