-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flyte-agent configure pod securityContext #4785
Flyte-agent configure pod securityContext #4785
Conversation
0b849c3
to
6376a60
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4785 +/- ##
==========================================
+ Coverage 58.54% 59.18% +0.63%
==========================================
Files 625 644 +19
Lines 53669 52491 -1178
==========================================
- Hits 31423 31068 -355
+ Misses 19731 18849 -882
- Partials 2515 2574 +59
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
type: spc_t | ||
# -- Security context for container | ||
securityContext: | ||
allowPrivilegeEscalation: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the container also needs no Linux caps and this would drop like
capabilities:
drop:
- ALL
But given I'm not super familiar with what's inside your agent code (and considering there are more than just "default" agents), I left it off as a default. The most common caps I've seen containers need added back are CAP_CHOWN
and CAP_MKNOD
. CAP_CHOWN
generally results from the container image being setup incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to test this in your branch? I don't think the CI will catch a potential problem, but in general, I don't see the Task containers needing root caps so dropping all should be fine.
@pingsutw do you see the Agent needing elevated permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a Python-based agent (based on the Flytekit SDK examples), we haven't seen it require any of the default container capabilities. Note that depending on whether or not you're on containerd or cri-o, the defaults are different. OpenShift maintains a smaller list -- so if you have customers on OpenShift that should be good signal that you don't need things like NET_ADMIN
charts/flyteagent/values.yaml
Outdated
@@ -54,6 +54,13 @@ serviceAccount: | |||
annotations: {} | |||
# -- ImagePullSecrets to automatically assign to the service account | |||
imagePullSecrets: [] | |||
# -- Security context for pod | |||
podSecurityContext: | |||
seLinuxOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could also set something like:
runAsNonRoot: true
runAsUser: 1000
When I ran the default container it died on startup though -- so there's some additional work to do to make it run non-root which is out of scope for making these settings configurable in Helm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there were previous attempts to enable at least the flytesnacks containers to run rootless, but nothing concrete yet. Probably that could explain your Issue running it.
Agree that it probably is out of scope here, but definitely it should be configurable.
charts/flyteagent/values.yaml
Outdated
@@ -54,6 +54,13 @@ serviceAccount: | |||
annotations: {} | |||
# -- ImagePullSecrets to automatically assign to the service account | |||
imagePullSecrets: [] | |||
# -- Security context for pod | |||
podSecurityContext: | |||
seLinuxOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there were previous attempts to enable at least the flytesnacks containers to run rootless, but nothing concrete yet. Probably that could explain your Issue running it.
Agree that it probably is out of scope here, but definitely it should be configurable.
type: spc_t | ||
# -- Security context for container | ||
securityContext: | ||
allowPrivilegeEscalation: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to test this in your branch? I don't think the CI will catch a potential problem, but in general, I don't see the Task containers needing root caps so dropping all should be fine.
@pingsutw do you see the Agent needing elevated permissions?
@@ -71,6 +71,9 @@ spec: | |||
helm.sh/chart: flyteagent-v0.1.10 | |||
app.kubernetes.io/managed-by: Helm | |||
spec: | |||
securityContext: | |||
seLinuxOptions: | |||
type: spc_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically says "don't use SELinux" -- but happy to remove it if you don't find it useful. This is more likely to come up in RedHat / OpenShift land
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful. I'd just make it configurable and disabled by default. We haven't seen much on-prem users running RHEL/Openshift but if this is useful for them, at least making it available would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already configurable, so I just removed the seLinuxOptions
altogether. Users can decide if they want it for their agents or not.
- Follow security best practices and allow for more granular configuration. Right now, the default flyteagent container cannot run with the following desirable podSecurityContext: runAsNonRoot: true runAsUser: XXX The container crashes on startup It would also be desirable to set a container securityContext with: capabilities: drop: - ALL The container launched with caps dropped, but I wasn't familiar enough with the code in that container to know if that will actually work, so didn't set a default Signed-off-by: ddl-ebrown <[email protected]>
6376a60
to
b1e72c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
I guess this is just the beginning and we'll still need to revisit the security hardening for the Agent.
Yeah I would generally advocate for secure by default, with explicit changes required to decrease security posture. That said, you have a lot of different agents right now, with more on the way -- so I didn't want to guess wrong and break everything on you. :) Longer term it might be good to take a swing at checking the various agents out to see if they behave properly as non-root |
I will put up a separate PR for the flyte-core charts
Tracking issue
https://github.com/flyteorg/flyte/issues/
Why are the changes needed?
What changes were proposed in this pull request?
How was this patch tested?
Setup process
Check all the applicable boxes
Related PRs
Docs link