-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Increase the instance type when scaling up llumlet #87
base: main
Are you sure you want to change the base?
Conversation
|
|
3a172f3
to
018bb3b
Compare
|
|
5c4fbaa
to
af205ca
Compare
|
af205ca
to
2be196e
Compare
|
|
6c40c99
to
d23862a
Compare
|
|
b30b86f
to
88c3102
Compare
|
|
|
|
@zhypku |
|
|
instance_type = InstanceType.DECODE | ||
else: | ||
# compute distance if launch prefill or decode | ||
normal_distance = pd_ratio[0] - pd_ratio[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pd ratio should be the abosolute value of number of prefill/decode instance? or it's just a ratio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a ratio, like "1:2"
await server.run.remote(manager, instance_id, instance) | ||
self.scale_up(instance_id, instance) | ||
pending_pg_states = list_placement_groups(filters=[("state", "=", "PENDING")]) | ||
rescheduling_pg_states = list_placement_groups(filters=[("state", "=", "RESCHEDULING")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should not be rescheduling pg in normal states
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if rescheduling_pg_states is not 0, just wait next round and do nothing here
tests/e2e_test/utils.py
Outdated
f"--max-num-batched-tokens {max_num_batched_tokens} " | ||
f"{'> instance_'+result_filename if len(result_filename)> 0 else ''} 2>&1 &" | ||
) | ||
return command | ||
|
||
def generate_serve_command(result_filename: str = "", | ||
launch_ray_cluster: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need of launch ray cluster here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just maintain consistency to generate_launch_command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global launch always don't launch ray cluster. adding it for consistency is not resonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think always launching ray cluster is more common case in pratical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accept
169e4cd
to
9ee73aa
Compare
llumnix/launcher.py
Outdated
self.port_offset += 1 | ||
if self.enable_port_offset_store: | ||
put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset) | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why naming config here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 'new_instance_args' is also acceptable, but it can be quite similar to the 'instance_args' being passed, so name it 'config'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think next_instance_args is better, config is strange.
llumnix/launcher.py
Outdated
cur_num_decode = len(self.global_scheduler.instance_id_set - | ||
self.global_scheduler.dispatch_scheduler.available_dispatch_instance_set) | ||
config.instance_type = self._get_next_instance_type(cur_num_prefill, cur_num_decode, self.pd_ratio) | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why naming config here
llumnix/launcher.py
Outdated
|
||
def _get_next_instance_args(self, instance_args) -> InstanceArgs: | ||
assert not self.enablde_engine_pd_disagg, \ | ||
"currently not support engine based pd-disaggregation in Global Launch Model." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global launch mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accept
|
|
|
|
|
|
|
|
6c1ef67
to
bcce670
Compare
@@ -42,9 +42,6 @@ def add_argument(self, *args, **kwargs): | |||
kwargs['default'] = None | |||
super().add_argument(*args, **kwargs) | |||
|
|||
|
|||
# All the default values of llumnix arguments are set in default.py. So all the arguments here are set to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add back
llumnix/launcher.py
Outdated
self.port_offset += 1 | ||
if self.enable_port_offset_store: | ||
put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset) | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think next_instance_args is better, config is strange.
total_num_prefill = total_num_prefill - base_num_ratio * pd_ratio[0] | ||
total_num_decode = total_num_decode - base_num_ratio * pd_ratio[1] | ||
|
||
if total_num_prefill + total_num_decode == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these pd ratio codes should be placed in a seperate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i think these logic is quite indirect, why not just calculate the pd_ratio_if_add_prefill and pd_ratio_if_add_decode, distance is hard to understand.
|
|
|
--pd-ratio
to determine the instance type when launching instance under global launch context.