-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override RayJob configs #1763
base: master
Are you sure you want to change the base?
Override RayJob configs #1763
Conversation
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@@ -34,6 +34,8 @@ class RayJobConfig: | |||
head_node_config: typing.Optional[HeadNodeConfig] = None | |||
runtime_env: typing.Optional[dict] = None | |||
address: typing.Optional[str] = None | |||
# TODO: This config should be added to flyteidl. https://github.com/flyteorg/flyteidl/blob/95e11cca2dac18b727f122bbc4456ea6ab499289/protos/flyteidl/plugins/ray.proto#L8 | |||
config_override: typing.Dict[str, str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the type needs to be Dict[str, Any]
or there're other more elegant ways to handle this.
Configs for head/worker spec in RayJob
are buried under a few layers, e.g.
rayClusterSpec:
headGroupSpec:
template:
spec:
resources: ...
You also have different specs for head and worker groups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if having this generic config is beneficial vs extending the RayJobConfig
with specific options? For example, we could add a namespace
option to RayJobConfig
, a resources (or limits) to the HeadNodeConfig
and WorkerNodeConfig
, etc.
The reason why I think it would be better to go that direction is because the ray backend plugin will have to know this configurations options explicitly anyways, because it needs to override specific parts of the RayJob
and RayCluster
CRDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that get_config
will be serialized to proto map<string, string>, so we can only use Dict[str, str] here.
We definitely can extend RayJobConfig, but we need to update IDL first here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RayJobConfig will be serialized to RayJob proto we defined in the IDL, then flytepropeller will unmarshal it.
TL;DR
WIP
Type
Are all requirements met?
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
https://github.com/flyteorg/flyte/issues/
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/