Flyte-agent configure pod securityContext #4785

ddl-ebrown · 2024-01-27T00:05:45Z

Follow security best practices and disable privilege escalation by default in the agent. Also make both the pod and container security contexts user configurable. I wanted to drop caps on the container, but left that out given I'm not familiar with what the default agent needs. If running through CI is enough to detect whether anything breaks, then I'm happy to add those back and we can see what happens. Thoughts @davidmirror-ops ?

I will put up a separate PR for the flyte-core charts

Tracking issue

https://github.com/flyteorg/flyte/issues/

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Related PRs

Docs link

codecov · 2024-01-27T16:14:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (abef3f5) 58.54% compared to head (b1e72c6) 59.18%.
Report is 17 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4785      +/-   ##
==========================================
+ Coverage   58.54%   59.18%   +0.63%     
==========================================
  Files         625      644      +19     
  Lines       53669    52491    -1178     
==========================================
- Hits        31423    31068     -355     
+ Misses      19731    18849     -882     
- Partials     2515     2574      +59

Flag	Coverage Δ
unittests	`58.32% <ø> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ddl-ebrown · 2024-01-27T17:24:09Z

charts/flyteagent/values.yaml

+    type: spc_t
+# -- Security context for container
+securityContext:
+  allowPrivilegeEscalation: false


Ideally the container also needs no Linux caps and this would drop like

capabilities: drop: - ALL

But given I'm not super familiar with what's inside your agent code (and considering there are more than just "default" agents), I left it off as a default. The most common caps I've seen containers need added back are CAP_CHOWN and CAP_MKNOD. CAP_CHOWN generally results from the container image being setup incorrectly.

Is there a way to test this in your branch? I don't think the CI will catch a potential problem, but in general, I don't see the Task containers needing root caps so dropping all should be fine.
@pingsutw do you see the Agent needing elevated permissions?

Using a Python-based agent (based on the Flytekit SDK examples), we haven't seen it require any of the default container capabilities. Note that depending on whether or not you're on containerd or cri-o, the defaults are different. OpenShift maintains a smaller list -- so if you have customers on OpenShift that should be good signal that you don't need things like NET_ADMIN

ddl-ebrown · 2024-01-27T17:25:25Z

charts/flyteagent/values.yaml

@@ -54,6 +54,13 @@ serviceAccount:
  annotations: {}
  # -- ImagePullSecrets to automatically assign to the service account
  imagePullSecrets: []
+# -- Security context for pod
+podSecurityContext:
+  seLinuxOptions:


Ideally we could also set something like:

runAsNonRoot: true runAsUser: 1000

When I ran the default container it died on startup though -- so there's some additional work to do to make it run non-root which is out of scope for making these settings configurable in Helm

I think there were previous attempts to enable at least the flytesnacks containers to run rootless, but nothing concrete yet. Probably that could explain your Issue running it.
Agree that it probably is out of scope here, but definitely it should be configurable.

davidmirror-ops · 2024-02-01T18:23:53Z

charts/flyteagent/values.yaml

@@ -54,6 +54,13 @@ serviceAccount:
  annotations: {}
  # -- ImagePullSecrets to automatically assign to the service account
  imagePullSecrets: []
+# -- Security context for pod
+podSecurityContext:
+  seLinuxOptions:


I think there were previous attempts to enable at least the flytesnacks containers to run rootless, but nothing concrete yet. Probably that could explain your Issue running it.
Agree that it probably is out of scope here, but definitely it should be configurable.

davidmirror-ops · 2024-02-01T18:29:08Z

charts/flyteagent/values.yaml

+    type: spc_t
+# -- Security context for container
+securityContext:
+  allowPrivilegeEscalation: false


Is there a way to test this in your branch? I don't think the CI will catch a potential problem, but in general, I don't see the Task containers needing root caps so dropping all should be fine.
@pingsutw do you see the Agent needing elevated permissions?

davidmirror-ops · 2024-02-01T18:37:36Z

deployment/agent/flyte_agent_helm_generated.yaml

@@ -71,6 +71,9 @@ spec:
        helm.sh/chart: flyteagent-v0.1.10
        app.kubernetes.io/managed-by: Helm
    spec:
+      securityContext:
+        seLinuxOptions:
+          type: spc_t


I'm wondering why we need this?

This basically says "don't use SELinux" -- but happy to remove it if you don't find it useful. This is more likely to come up in RedHat / OpenShift land

https://jaosorior.dev/2019/selinux-and-kubernetes/

I think it's useful. I'd just make it configurable and disabled by default. We haven't seen much on-prem users running RHEL/Openshift but if this is useful for them, at least making it available would help.

It was already configurable, so I just removed the seLinuxOptions altogether. Users can decide if they want it for their agents or not.

- Follow security best practices and allow for more granular configuration. Right now, the default flyteagent container cannot run with the following desirable podSecurityContext: runAsNonRoot: true runAsUser: XXX The container crashes on startup It would also be desirable to set a container securityContext with: capabilities: drop: - ALL The container launched with caps dropped, but I wasn't familiar enough with the code in that container to know if that will actually work, so didn't set a default Signed-off-by: ddl-ebrown <[email protected]>

davidmirror-ops

Thank you.
I guess this is just the beginning and we'll still need to revisit the security hardening for the Agent.

ddl-ebrown · 2024-02-05T19:29:33Z

Yeah I would generally advocate for secure by default, with explicit changes required to decrease security posture.

That said, you have a lot of different agents right now, with more on the way -- so I didn't want to guess wrong and break everything on you. :)

Longer term it might be good to take a swing at checking the various agents out to see if they behave properly as non-root

ddl-ebrown force-pushed the add-helm-security-context branch 2 times, most recently from 0b849c3 to 6376a60 Compare January 27, 2024 07:51

ddl-ebrown commented Jan 27, 2024

View reviewed changes

ddl-ebrown marked this pull request as ready for review January 27, 2024 17:25

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request security Issues related to Security improvements labels Jan 27, 2024

ddl-ebrown mentioned this pull request Jan 31, 2024

Flyte-core define pod and container securityContext #4809

Merged

3 tasks

ddl-ebrown changed the title ~~Configure Flyte agent pod securityContext~~ Flyte-agent configure pod securityContext Feb 1, 2024

davidmirror-ops reviewed Feb 1, 2024

View reviewed changes

ddl-ebrown force-pushed the add-helm-security-context branch from 6376a60 to b1e72c6 Compare February 2, 2024 17:51

davidmirror-ops approved these changes Feb 5, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 5, 2024

davidmirror-ops merged commit 1bbe867 into flyteorg:master Feb 5, 2024
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flyte-agent configure pod securityContext #4785

Flyte-agent configure pod securityContext #4785

ddl-ebrown commented Jan 27, 2024 •

edited

Loading

codecov bot commented Jan 27, 2024 •

edited

Loading

ddl-ebrown Jan 27, 2024

davidmirror-ops Feb 1, 2024

ddl-ebrown Feb 2, 2024

ddl-ebrown Jan 27, 2024

davidmirror-ops Feb 1, 2024

davidmirror-ops Feb 1, 2024

davidmirror-ops Feb 1, 2024

davidmirror-ops Feb 1, 2024

ddl-ebrown Feb 2, 2024

davidmirror-ops Feb 2, 2024

ddl-ebrown Feb 2, 2024

davidmirror-ops left a comment

ddl-ebrown commented Feb 5, 2024

Flyte-agent configure pod securityContext #4785

Flyte-agent configure pod securityContext #4785

Conversation

ddl-ebrown commented Jan 27, 2024 • edited Loading

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Check all the applicable boxes

Related PRs

Docs link

codecov bot commented Jan 27, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidmirror-ops left a comment

Choose a reason for hiding this comment

ddl-ebrown commented Feb 5, 2024

ddl-ebrown commented Jan 27, 2024 •

edited

Loading

codecov bot commented Jan 27, 2024 •

edited

Loading