-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Different tasks should create different ray clusters #82
Comments
I think this might be driven by running Ray only in a single namespace, called "ray" This means that the second Ray instance tries to access everything exactly like the first, including heartbeat, failover, etc. so it can't be a separate thing. Putting the two Ray instances into their own namespaces would solve this, but presents a second set of issues; we need resources in addition to Ray to be running in that namespace, such as PersistentVolumeClaim like this one:
With a single namespace, I can apply this yaml separately, and then have the Ray cluster config pick up the PVC and use it. If each Ray instance is in its own namespace, we will need a way to inject a series of configurations like this one into each namespace so that Ray can have full access to the necessary resources. I'll also note that the details of how we do this are going to be AZ-specific, with each AZ needing to mount its own local resources (and never mount them in the wrong one) |
Hey @DavidSacerdote The operator creates a cluster based on the given cluster specification, which includes the cluster name. An example of the cluster specification can be found here: https://github.com/astronomer/astro-provider-ray/blob/main/dev/dags/scripts/ray.yaml. As mentioned in your comment above, "I think this might be driven by running Ray only in a single namespace, called "ray"" (#82 (comment)). The cluster name is hardcoded in the specification, so we can't have 2 clusters with the same name as the cluster name should be unique within a namespace. Did you consider testing with a unique specification per task to ensure the cluster name remains unique? We also have the Recently, there was an attempt in PR: #67 to improve the configuration with dynamic cluster configuration. Additionally, we have an issue: #81 to track this topic. Do you think it will help in this regard, and are you looking for the same? |
This is being fixed as part of #81 and was tested using the branch issue-81-mess |
Currently, as of 0.2.1, when running a DAG with multiple @task.ray, all the tasks are reusing the same ray cluster. They should create separate and independent clusters.
The text was updated successfully, but these errors were encountered: