-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource control based on cgroup #7
base: main
Are you sure you want to change the base?
Conversation
reps/2022-04-01-resource-control.md
Outdated
- worker_resource_control_method: Set to "cgroup" by default. | ||
- cgroup_manager: Set to "cgroupfs" by default. We should also support `systemd`. | ||
- cgroup_mount_path: Set to `/sys/fs/cgroup/` by default. | ||
- cgroup_unified_hierarchy: Whether to use cgroup v2 or not. Set to `False` by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we only support cgroup v2 to reduce the scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also support cgroup v1 because it has been used widely and written into standard. I have described here https://github.com/ray-project/enhancements/pull/7/files#diff-98ccfc9582e95581aae234797bc273b2fb68cb9e4dcc3030c8e94ba447daef7dR112-R113.
And in Ant, we only support v1 in production. Can you check your production environments and CI environments?
reps/2022-04-01-resource-control.md
Outdated
We also could use the [libcgroup](https://github.com/libcgroup/libcgroup/blob/main/README) to simplify the implementation. This library support both cgroup v1 and cgroup v2. | ||
|
||
##### systemd | ||
Using systemd to manage cgroups is like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the advantage of using systemd here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use systemfs in a systemd based system, there will be more than one component which manage the cgroup tree. And from https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/, we could know that:
In the short-term future writing directly to the control group tree from applications should still be OK, as long as the Pax Control Groups document is followed. In the medium-term future it will still be supported to alter/read individual attributes of cgroups directly, but no longer to create/delete cgroups without using the systemd API. In the longer-term future altering/reading attributes will also be unavailable to userspace applications, unless done via systemd's APIs (either D-Bus based IPC APIs or shared library APIs for passive operations).
In addition, systemd is highly recommended by runc
for cgroup v2 https://github.com/opencontainers/runc/blob/main/docs/cgroup-v2.md.
@scv119 want to shepherd this one? |
I'm not sure I see this as weird. This was one of the original use cases we envisioned for runtime_env plugins when we designed it (arbitrary modifications to the environment, such as attaching debuggers, resource limits, etc.).
This doesn't sound like a fundamental issue, but something we can add right? Here's my concern: we're adding a ton of stuff into core, which will not be used by most users. If we can do this with the existing plugin infra, with some minor improvements, then longer term that leads to a better architecture than handling it directly in core. I can understand it makes this project a little more involved, but from the overall Ray project perspective this is cleaner long term as we might have other types of plugins of this sort. |
@ericl I agree that the pluggable architecture could make the core cleaner in long term. And I could try to make it to be a btw, I don't think the resource control is a minor improvement. Maybe it will be a basic ability of Ray Core and widely used by users in large scale cluster in future. We already have a lot issues in Ant and we plan to put it into production in few month. |
I'd like to see the discussion outcome then. If @edoakes and @architkulkarni think it's not a good fit for runtime env plugin, then I'm fine with that decision. |
The issue with deletion makes sense, I agree it seems tricky to support it with the current abstraction. What would the ideal abstraction look like? One naive idea is to have each runtime env plugin have a method |
Yep, we should add a method like what you said. And we should do more things to make it ideal. For example, we need to support configuration for each plugin, because I will add a lot cluster level config here https://github.com/ray-project/enhancements/pull/7/files#diff-98ccfc9582e95581aae234797bc273b2fb68cb9e4dcc3030c8e94ba447daef7dR24. @architkulkarni btw, do we plan to make current runtime env modules, like pip and conda, to be plugins? Or we will keep it in long term? Another thing is what @raulchen said here #7 (comment). What do you think about this? In addition, do we need to support applying multiple plugins? If we do, how can we guarantee the sort and avoid the conflicts between plugins? We should consider any more for those issues in the abstraction of plugin, right? |
I made some edits to https://docs.google.com/document/d/1x1JAHg7c0ewcOYwhhclbuW0B0UC7l92WFkF4Su0T-dk/edit# that I think will enable cgroups to be supported (namely, adding priority, create_for_worker, delete_for_worker). Cgroups might be implementable out of the box as a plugin after these hooks are added, wdyt?
I believe this is the plan, yes. |
I got your thought. But I think this doc is still not clear.
btw, @ericl @architkulkarni I think we should also make the google doc to be a REP, wdyt? Who can take this work? I'm also available for this if you don't have bandwidth. |
I think we are on the same page that resource control is better implemented as a plugin. Also, it seems that the plugin system still needs an overhaul to meet different kinds of runtime env requirements.
What do you think? If we agree, @SongGuyang can add the first part to this REP first. |
+1
…On Thu, Apr 14, 2022, 9:27 PM Hao Chen ***@***.***> wrote:
I think we are on the same page that resource control is better
implemented as a plugin. Also, it seems that the plugin system still needs
an overhaul to meet different kinds of runtime env requirements.
So my suggestion is that we can do 2 things simultaneously:
1. summarize the minimal needed changes of the plugin system to
support cgroup. And make these changes as part of the this REP.
2. discuss what other changes are needed to support other plugins, and
potentially file another REP for the plugin system refinements.
What do you think? If we agree, @SongGuyang
<https://github.com/SongGuyang> can add the first part to this REP first.
—
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSWAKNJHTPLAV4G5WPLVFDV4JANCNFSM5S4EASIQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@raulchen +1. Please keep @architkulkarni in the loop on the plugin changes, he can also help scope out the design offline first. |
reps/2022-04-01-resource-control.md
Outdated
```python | ||
runtime_env = { | ||
"enable_resource_control": True, | ||
"resource_control_config": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cgroup version automatically detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a util function in runc https://github.com/opencontainers/runc/blob/main/libcontainer/cgroups/utils.go#L34-L50. But cgroup v1 and v2 could both be enabled in some systems. And it also depends on which sub systems(e.g. cpu, memory) has been enable in the cgroup. So, I'd like to add a config in first step. We can also change this in future.
I will continue to work on current REP this week and back to discuss with you about the left comments. |
@ericl @architkulkarni @simon-mo @rkooo567 Take a look again? |
Another change is that we should support loading runtime env plugins when we start the Ray nodes. For example: | ||
``` | ||
ray start --head --runtime-env-plugins="ray.experimental.cgroup_env_plugin:ray.experimental.test_env_plugin" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from eager_install
, which already exists in Ray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things absolutely. This option is used to register plugins into the dashboard agent. For example, you can register a plugin named "conda" when start the ray node. So, when a job uses "conda" field in runtime_env
after the ray cluster starts, agent could call the "conda" plugin directly to set up the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! In the pluggability design there's a related idea which is to use environment variables, e.g. RAY_RUNTIME_ENV_PLUGIN_CGROUP=ray.experimental.test_cgroup_plugin ray start --head
. These both seem straightforward to implement but I'm not sure which approach is the best, do you have any thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SongGuyang we did a small group review of this and this was the conclusion/feedback:
- The requisite changes to runtime_env plugin architecture make sense and would be an overall improvement.
- We would prefer that the functionality does not live in the main Ray repo to begin with, but we can reevaluate this later as it matures and we get more user feedback.
- We aren't quite sure what the gap between this and the container field in runtime_env is or how we should be thinking about them together. Should we support both? Can we support the cgroup functionality by extending container support? If we are providing cgroup, should we remove container support? Additionally, the existing container support has not been publicly documented/adopted nor maintained. Given these concerns, we think it would sense to evaluate both the container and cgroup options holistically as part of this REP -- a strawman proposal would be to move the existing container support out of the codebase and into a plugin as well if we think both are needed.
- Could you please add more details about how this will be tested to the REP? Specifically, where will the tests live, what types of unit & integration tests we will need, etc.? Especially if this is not living in the main Ray repo we will need to make sure that the testing infrastructure is set up to avoid breakages.
Also, @ericl asked me to shepherd this change given that I have more context on runtime_env, so could you please update the text accordingly?
Please let me know if you have any questions!
Co-authored-by: Edward Oakes <[email protected]>
Why it couldn't to be an experimental module in the main Ray repo? I didn't get a strong reason for this in your description. |
I think we need both. Cgroup is only used to control the physics resources, e.g. cpus and memory, but the container is used for a lot of purposes, e.g. custom linux kernel, custom system packages(yum install ...), custom file system, and resource control certainly. We cannot use container instead of cgroup in some scenes because cgroup is more lightweight which doesn't rely on image and image center. When you create a directory in the file system, a cgroup is obtained. Can we only add some description like what I say above? Container is another big issue. I ensure that it could also be a plugin. But, please don't let it block the pluggability enhancement and cgroup. We can discuss it in a separate issue or REP. |
@edoakes here are my thoughts to your questions.
As @SongGuyang mentioned above, sometimes users only want resource control + pip requirements. In this case, containers are too heavy-weighted, as they would have to manage their docker files and images.
I'd prefer to putting it in the main repo (at least to begin with). For one, I think the resource control requirement is as common as other existing ones. Putting it in the main repo would benefit more users. For another, we need to keep refining the plugin system along with this feature. Putting everything in the same repo would facilitate testing. I think we can put it in a new "runtime_env_plugins" directory, and later migrate existing runtime env to this directory as well. The core shouldn't depend on anything in this directory. Also, runtime envs can be shipped separately as well. E.g., "pip install ray[runtime_env_pip]" "pip install ray[runtime_env_container]". |
Re: cgroups + containers, I understand that you cannot use cgroup as a replacement for containers, but containers are essentially a superset of cgroup, so could we leverage the existing container support for this purpose as well without requiring users to specify an image or worry about other container details? For example, make the image and cgroup options both configurable in the Re: putting it in the repo vs. outside, the reasoning is we do not yet have evidence of user demand for this feature and adding it to the core repo increases maintenance burden. A relevant example is the container support in runtime_env: this is not something used by the community because it's not polished or well-documented, but it has increased maintenance burden for runtime_env and caused test failures without anyone taking clear ownership (the relevant test is now disabled). We want to ensure the same thing doesn't happen w/ cgroup support. Once the feature is fully-fledged and we see user demand, we can reconsider promoting it to put it in the repo. |
See the doc for more details.