Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override RayJob configs #1763

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Override RayJob configs #1763

wants to merge 2 commits into from

Conversation

pingsutw
Copy link
Member

TL;DR

WIP

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

https://github.com/flyteorg/flyte/issues/

Follow-up issue

NA
OR
https://github.com/flyteorg/flyte/issues/

Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw marked this pull request as draft July 26, 2023 18:36
Signed-off-by: Kevin Su <[email protected]>
@@ -34,6 +34,8 @@ class RayJobConfig:
head_node_config: typing.Optional[HeadNodeConfig] = None
runtime_env: typing.Optional[dict] = None
address: typing.Optional[str] = None
# TODO: This config should be added to flyteidl. https://github.com/flyteorg/flyteidl/blob/95e11cca2dac18b727f122bbc4456ea6ab499289/protos/flyteidl/plugins/ray.proto#L8
config_override: typing.Dict[str, str] = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the type needs to be Dict[str, Any] or there're other more elegant ways to handle this.

Configs for head/worker spec in RayJob are buried under a few layers, e.g.

rayClusterSpec:
    headGroupSpec:
        template:
            spec:
                resources: ...

You also have different specs for head and worker groups

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if having this generic config is beneficial vs extending the RayJobConfig with specific options? For example, we could add a namespace option to RayJobConfig, a resources (or limits) to the HeadNodeConfig and WorkerNodeConfig, etc.

The reason why I think it would be better to go that direction is because the ray backend plugin will have to know this configurations options explicitly anyways, because it needs to override specific parts of the RayJob and RayCluster CRDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that get_config will be serialized to proto map<string, string>, so we can only use Dict[str, str] here.
We definitely can extend RayJobConfig, but we need to update IDL first here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RayJobConfig will be serialized to RayJob proto we defined in the IDL, then flytepropeller will unmarshal it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants