Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-29729: Refactor security context configuration in pod reconciler #3185

Closed
wants to merge 4 commits into from

Conversation

bentito
Copy link
Contributor

@bentito bentito commented Mar 4, 2024

Description of the change:
This change updates the logic for setting security contexts within the OLM pod reconciler. Now, it differentiates between 'Restricted' and 'Legacy' security contexts more explicitly. The 'Restricted' security context applies default security settings unless overridden, while the 'Legacy' context clears all security settings. When no security context is configured, it defaults to restricted. Additionally, the related tests have been updated to reflect these changes and ensure correct behavior.

Motivation for the change:
OCPBUGS-29729

Architectural changes:
None

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2024
@openshift-ci openshift-ci bot requested review from kevinrizza and tmshort March 4, 2024 23:03
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 4, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 4, 2024
@bentito
Copy link
Contributor Author

bentito commented Mar 5, 2024

/unit unit

Comment on lines 221 to 223
} else {
// If GrpcPodConfig is nil, apply 'restricted' security settings by default
addSecurityContext(pod, runAsUser) // Default to restricted settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this elsewhere, but the implication here is that we will break existing catalog sources that reference catalog images that are not compatible with the securityContext required by the PSA restricted mode.

In general, are these implications well understood and gamed out? Any chance we have a design doc about this change of the default behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IBM had suggested a change where if the securityContextConfig is unset, we look at the PSA configuration of the namespace. If the namespace has enforce: restricted, then we default to restricted. Otherwise, we keep the default as legacy.

That seems like it would resolve the bug without breaking existing CatalogSources that are running correctly in non-restricted namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that approach makes sense. I will rework this fix to check the namespace PSA config first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we want to check the namespace for its PSA config, we'd make a change something like this with new calling code in the Pod() function and the new PSA checking function below. But we need to get the namespace info and we don't have the wherewithal in Pod() currently, specifically an operatorClient.

Pod() is called in many places in the codebase, so I am trying not to modify its signature. We could add the needed building blocks globally or is there some better way with controller-runtime I am missing?

	// Define default security context config
	securityContextConfig := operatorsv1alpha1.Legacy

	// Determine the security context configuration based on GrpcPodConfig
	if source.Spec.GrpcPodConfig != nil {
		if source.Spec.GrpcPodConfig.SecurityContextConfig != "" {
			securityContextConfig = source.Spec.GrpcPodConfig.SecurityContextConfig
		} else {
			// If SecurityContextConfig is unset, check the PSA configuration of the namespace
			var err error
			securityContextConfig, err = getNamespacePSAConfig(source.Namespace, opClient)
			if err != nil {
				return nil, fmt.Errorf("failed to get namespace PSA configuration: %v", err)
			}
		}
	} else {
		// If GrpcPodConfig is nil, apply legacy security settings by default
		securityContextConfig = operatorsv1alpha1.Legacy
	}

	// Apply the determined security settings to the pod
	if securityContextConfig == operatorsv1alpha1.Restricted {
		// Apply 'restricted' security settings
		addSecurityContext(pod, runAsUser) // This should be a predefined function that sets restricted security context to the pod
	} else {
		// For 'Legacy' or any other case, clear all security contexts
		clearAllSecurityContexts(pod) // This should be a predefined function that removes all security contexts from the pod
	}

// getNamespacePSAConfig checks the namespace for the PSA configuration and returns the applicable security config.
func getNamespacePSAConfig(namespace string, client operatorclient.ClientInterface) (operatorsv1alpha1.SecurityConfig, error) {
	ns, err := client.KubernetesInterface().CoreV1().Namespaces().Get(context.TODO(), namespace, metav1.GetOptions{})
	if err != nil {
		return "", fmt.Errorf("error fetching namespace: %v", err)
	}
	// 'pod-security.kubernetes.io/enforce' is the label used for enforcing namespace level security,
	// and 'restricted' is the value indicating a restricted security policy.
	if val, exists := ns.Labels["pod-security.kubernetes.io/enforce"]; exists && val == "restricted" {
		return operatorsv1alpha1.Restricted, nil
	}

	return operatorsv1alpha1.Legacy, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still waiting for an update on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @tmshort if you have any idea for best way to get a client avail at that scope to get a namespace to check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely avoid making a client call inside the Pod function, especially since we call it so many times.

Steve ran into issues where the resolver would use clients to fetch data from the cache and/or apiserver. The result was really strange race conditions because the cache and apiserver can change during the course of a single resolution.

This sounds like it could run into similar problems. I think we should find a way to call getNamespacePSAConfig just once during a reconciliation and then use its result anywhere it is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit (f19d68a) adds checking of the namespace PSA restrictions. Currently, I am at about 8 files changed with many changes in each file (git diff is 871 lines). I am close, but still have some failing unit tests.

My basic approach was looking at the call graph and trying to minimize changes. Given that, I am trying to change the two wrapper Pod() functions in configmap.go and grpc.go. These locations already have access to the client and seem like a good place to put a fetch of the namespace object. We're checking if it's cached already to meet the only call once per reconcile loop goal.

I guess my question is, does this seem worth continuing to pursue for this bug fix, hammer out the rest of the unit test problems? Or, try some other approach...

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2024
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bentito added 3 commits April 4, 2024 17:24
This change updates the logic for setting security contexts within the OLM pod reconciler. Now, it differentiates between 'Restricted' and 'Legacy' security contexts more explicitly. The 'Restricted' security context applies default security settings unless overridden, while the 'Legacy' context clears all security settings. When no security context is configured, it defaults to restricted. Additionally, the related tests have been updated to reflect these changes and ensure correct behavior.

Signed-off-by: btofel <[email protected]>
Signed-off-by: btofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
@perdasilva
Copy link
Collaborator

Closing in favor of: #3206

@perdasilva perdasilva closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants