Why create_colocated_worker_cls and spawn #29

eelxpeng · 2024-11-28T17:41:34Z

It seems that there are two ways to make two worker group to use the same resource pool:

Separate Workers on Same Resource Pool:

# Creates two separate worker groups sharing resources
actor_wg = RayWorkerGroup(resource_pool=resource_pool, ray_cls_with_init=actor_cls)
critic_wg = RayWorkerGroup(resource_pool=resource_pool, ray_cls_with_init=critic_cls)

Colocated Workers:

# Creates a single worker group that implements both actor and critic
cls_dict = {'actor': actor_cls, 'critic': critic_cls}
ray_cls_with_init = create_colocated_worker_cls(cls_dict)
wg_dict = RayWorkerGroup(resource_pool=resource_pool, ray_cls_with_init=ray_cls_with_init)

# Spawns interfaces to access different functionalities
spawn_wg = wg_dict.spawn(prefix_set=cls_dict.keys())

With the latest main branch of Ray package where the fix ray-project/ray#48088 is merged, the first method should work without any problem to allow two different worker group to re-use the same GPUs. It appears that the 2nd method is not that straightforward. But the PPO implementation uses the second method to put all worker groups to the global pool. Is there any specific reason for preferring the 2nd method? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why create_colocated_worker_cls and spawn #29

Why create_colocated_worker_cls and spawn #29

eelxpeng commented Nov 28, 2024

Why create_colocated_worker_cls and spawn #29

Why create_colocated_worker_cls and spawn #29

Comments

eelxpeng commented Nov 28, 2024