Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Increase the instance type when scaling up llumlet #87

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

KuilongCui
Copy link
Contributor

@KuilongCui KuilongCui commented Dec 17, 2024

  1. move the instance level args from ManagerArgs to InstanceArgs
  2. move launch functions from manager to new file launcher.py
  3. Increase the instance type when scaling up llumlet ['prefill', 'decode', 'no_constraints']
  4. Remove --num-dispatched-instance parameters,use --pd-ratio to determine the instance type when launching instance under global launch context.
  5. compute dispatch_load_metric and migration_load_metric in llumlet.
  6. add pd-disadd test for bench_test and correctness_test

@AlibabaPAI AlibabaPAI deleted a comment from github-actions bot Dec 17, 2024
@AlibabaPAI AlibabaPAI deleted a comment from github-actions bot Dec 17, 2024
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 17710.80 67384.10 138713.98 166019.16 190160.06 76094.19
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.63 54.32 64.30 126.98 484.67 79.10

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 240.00 MB 280.00 MB 400.00 MB 544.00 MB
rpc_speed(GB/s) 1.03 1.51 1.78 1.90 1.97 2.08 2.06 2.16 2.21 2.16 2.33 2.30 2.36 2.40 2.38 2.37 2.38 2.46 2.43 2.45 2.59 2.55 2.45 2.47 2.50 2.51 2.61 2.71 2.81 3.10 3.23
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 224.00 MB 256.00 MB 336.00 MB 384.00 MB 400.00 MB 424.00 MB
gloo_speed(GB/s) 1.00 1.65 2.04 2.33 2.49 2.60 2.82 2.80 3.07 3.00 3.37 3.08 3.30 2.97 3.56 2.94 2.77 2.88 2.49 2.34 2.33 3.07 2.50 3.12 1.44 2.57 3.22 1.13 1.13 0.44 0.87

@KuilongCui KuilongCui changed the title [Core] Increase the instance type argument when scaling up [Core] Increase the instance type when scaling up llumlet Dec 18, 2024
@KuilongCui KuilongCui force-pushed the engine_type branch 3 times, most recently from 3a172f3 to 018bb3b Compare December 18, 2024 11:11
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 13675.20 74578.78 141961.76 175850.15 176266.63 80459.01
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.07 55.45 66.92 98.11 143.64 62.62

@KuilongCui KuilongCui requested review from s5u13b and zhypku December 18, 2024 11:29
Copy link

migration_size 1.41 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 280.00 MB 296.00 MB 464.00 MB
rayrpc_speed(GB/s) 3.59 1.04 1.57 1.79 1.95 2.00 2.12 2.11 2.23 2.23 2.23 2.33 2.32 2.40 2.41 2.30 2.53 2.50 2.48 2.40 2.47 2.56 2.59 2.53 2.54 2.53 2.75 2.65 2.71 2.62 2.72 3.28
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 232.00 MB 264.00 MB 464.00 MB
gloo_speed(GB/s) 1.01 1.72 2.24 2.42 2.49 2.60 2.84 3.00 2.91 3.32 2.88 2.93 3.55 3.28 3.20 3.15 3.43 2.89 2.18 2.80 2.43 1.95 1.64 2.88 2.21 3.47 1.79 3.28 1.08

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 248.00 MB 256.00 MB 264.00 MB 272.00 MB 304.00 MB 368.00 MB 376.00 MB 544.00 MB 592.00 MB
rayrpc_speed(GB/s) 3.87 1.03 1.50 1.75 1.89 2.01 2.03 2.10 2.19 2.15 2.25 2.26 2.34 2.32 2.40 2.40 2.37 2.49 2.51 2.40 2.58 2.59 2.26 2.41 2.61 2.65 2.71 2.75 2.64 2.77 2.57 2.59 2.94 2.76 2.98 3.07 3.20 3.31
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 200.00 MB 216.00 MB 224.00 MB 232.00 MB 280.00 MB 432.00 MB
gloo_speed(GB/s) 1.00 1.69 2.13 2.39 2.49 2.77 2.89 2.85 3.24 2.84 3.31 3.14 3.10 3.16 3.53 2.86 3.01 2.80 2.75 1.68 2.99 2.90 2.68 2.96 3.20 2.18 1.87 2.07 3.18

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 264.00 MB 272.00 MB 280.00 MB 336.00 MB 368.00 MB 472.00 MB 528.00 MB
rayrpc_speed(GB/s) 3.92 1.04 1.54 1.79 1.95 2.06 2.13 2.17 2.18 2.34 2.31 2.33 2.34 2.37 2.45 2.41 2.57 2.75 2.39 2.59 2.58 2.29 2.81 2.49 2.37 2.27 2.81 2.50 2.58 2.47 2.57 3.28 3.05 3.47 3.52
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 272.00 MB
gloo_speed(GB/s) 1.01 1.63 2.10 2.33 2.56 2.86 3.07 2.84 3.11 3.08 3.38 3.31 3.32 3.20 3.33 2.71 2.78 2.91 3.01 2.14 3.22 2.26 2.16 2.71 2.97 2.77 2.54 1.72 1.10

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 11542.56 72093.38 130429.15 171295.53 174903.31 77338.29
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.47 56.53 67.24 115.99 202.05 65.44

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 16704.73 82229.98 133101.58 174211.54 174502.70 79464.72
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.01 55.00 64.99 100.52 152.96 63.68

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 248.00 MB 256.00 MB 264.00 MB 352.00 MB 392.00 MB 464.00 MB 520.00 MB 536.00 MB
rayrpc_speed(GB/s) 3.60 1.05 1.53 1.74 1.94 2.04 2.13 2.17 2.20 2.28 2.33 2.36 2.40 2.34 2.45 2.38 2.47 2.54 2.60 2.50 2.52 2.66 2.47 2.55 2.60 2.49 2.52 2.61 2.81 2.65 2.75 2.86 2.91 3.09 3.17 3.32 3.38 3.52
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 248.00 MB 264.00 MB 400.00 MB
gloo_speed(GB/s) 1.04 1.81 2.21 2.51 2.67 2.90 2.96 3.08 3.25 3.17 3.30 3.48 3.56 4.03 3.30 3.01 3.25 2.29 2.09 2.99 2.62 3.61 3.00 3.27 3.56 1.97 2.51 3.37 2.99 1.03

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 272.00 MB 280.00 MB 296.00 MB 304.00 MB 352.00 MB 472.00 MB 504.00 MB 544.00 MB 680.00 MB
rayrpc_speed(GB/s) 3.54 1.01 1.52 1.73 1.87 1.98 2.02 2.10 2.13 2.21 2.18 2.25 2.27 2.36 2.36 2.37 2.32 2.37 2.43 2.44 2.47 2.52 2.41 2.58 2.61 2.34 2.61 2.56 2.69 2.60 2.49 2.60 2.66 2.87 3.01 3.20 3.37 3.26 3.31
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 264.00 MB 304.00 MB 312.00 MB 376.00 MB 416.00 MB 544.00 MB
gloo_speed(GB/s) 1.04 1.74 2.28 2.54 2.65 2.79 2.91 3.06 3.09 3.19 3.19 2.82 3.13 3.19 3.11 3.27 3.18 2.86 3.60 3.41 2.27 3.47 0.85 3.26 3.32 4.73 3.20 2.53 1.36 0.83 1.71 1.49 3.20 3.70

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1387.12 58432.06 59514.49 103447.36 106400.45 39682.36
decode p25 p50 p75 p95 p99 mean
latency(ms) 61.62 80.28 110.46 657.17 1320.23 154.27

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1339.67 82653.28 83803.09 108308.84 119539.22 50196.22
decode p25 p50 p75 p95 p99 mean
latency(ms) 59.74 79.13 119.59 906.66 2841.47 220.09

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 280.00 MB 472.00 MB
rayrpc_speed(GB/s) 3.46 1.06 1.60 1.81 1.98 2.03 2.13 2.21 2.26 2.26 2.30 2.27 2.46 2.30 2.46 2.45 2.48 2.53 2.67 2.46 2.58 2.63 2.54 2.61 2.76 2.73 2.71 2.77 2.79 2.62 2.58 3.32
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 248.00 MB 320.00 MB 440.00 MB
gloo_speed(GB/s) 1.05 1.77 2.22 2.49 2.64 2.86 3.14 3.19 3.24 3.16 3.38 3.15 3.49 3.77 3.43 3.14 2.94 2.99 3.40 2.89 2.14 2.28 1.50 2.44 2.89 2.40 0.71 4.42 1.35

@KuilongCui
Copy link
Contributor Author

@zhypku launcher.py is created to extract the launch logic from manager.py. class Launcher within launcher.py is used to initialize the server, initialize instances, and manage placement groups. I'm not sure if this naming follows conventional practices; do you have any suggestions?

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 280.00 MB 304.00 MB 344.00 MB 384.00 MB 480.00 MB 544.00 MB
rayrpc_speed(GB/s) 1.02 1.52 1.77 1.92 2.10 2.09 2.17 2.22 2.26 2.38 2.37 2.37 2.46 2.39 2.40 2.48 2.51 2.48 2.53 2.63 2.57 2.68 2.83 2.72 2.63 2.61 2.60 2.73 2.56 2.89 2.74 3.07 3.19 3.20 3.32
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 248.00 MB 304.00 MB 360.00 MB 480.00 MB
gloo_speed(GB/s) 1.02 1.66 2.09 2.32 2.59 2.57 2.87 2.97 2.94 2.92 3.31 3.31 3.32 3.23 3.07 2.70 3.06 3.17 2.49 2.91 2.72 3.20 2.97 3.24 1.91 2.97 3.24 3.70 3.26 2.75 1.37 2.59

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1230.77 55910.33 64405.19 93182.27 117636.98 40536.96
decode p25 p50 p75 p95 p99 mean
latency(ms) 61.25 75.36 103.41 298.65 1164.00 118.85

docs/Arguments.md Show resolved Hide resolved
llumnix/arg_utils.py Show resolved Hide resolved
llumnix/arg_utils.py Show resolved Hide resolved
llumnix/entrypoints/setup.py Show resolved Hide resolved
instance_type = InstanceType.DECODE
else:
# compute distance if launch prefill or decode
normal_distance = pd_ratio[0] - pd_ratio[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pd ratio should be the abosolute value of number of prefill/decode instance? or it's just a ratio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a ratio, like "1:2"

await server.run.remote(manager, instance_id, instance)
self.scale_up(instance_id, instance)
pending_pg_states = list_placement_groups(filters=[("state", "=", "PENDING")])
rescheduling_pg_states = list_placement_groups(filters=[("state", "=", "RESCHEDULING")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should not be rescheduling pg in normal states

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rescheduling_pg_states is not 0, just wait next round and do nothing here

llumnix/manager.py Show resolved Hide resolved
tests/e2e_test/test_correctness.py Show resolved Hide resolved
f"--max-num-batched-tokens {max_num_batched_tokens} "
f"{'> instance_'+result_filename if len(result_filename)> 0 else ''} 2>&1 &"
)
return command

def generate_serve_command(result_filename: str = "",
launch_ray_cluster: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need of launch ray cluster here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just maintain consistency to generate_launch_command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global launch always don't launch ray cluster. adding it for consistency is not resonable.

Copy link
Contributor Author

@KuilongCui KuilongCui Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think always launching ray cluster is more common case in pratical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept

self.port_offset += 1
if self.enable_port_offset_store:
put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset)
return config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why naming config here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'new_instance_args' is also acceptable, but it can be quite similar to the 'instance_args' being passed, so name it 'config'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think next_instance_args is better, config is strange.

cur_num_decode = len(self.global_scheduler.instance_id_set -
self.global_scheduler.dispatch_scheduler.available_dispatch_instance_set)
config.instance_type = self._get_next_instance_type(cur_num_prefill, cur_num_decode, self.pd_ratio)
return config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why naming config here


def _get_next_instance_args(self, instance_args) -> InstanceArgs:
assert not self.enablde_engine_pd_disagg, \
"currently not support engine based pd-disaggregation in Global Launch Model."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global launch mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 240.00 MB 248.00 MB 256.00 MB 272.00 MB 288.00 MB 296.00 MB 384.00 MB
rayrpc_speed(GB/s) 1.05 1.52 1.77 1.89 2.02 2.08 2.15 2.14 2.25 2.24 2.32 2.35 2.35 2.39 2.37 2.38 2.48 2.46 2.43 2.59 2.63 2.56 2.62 2.58 2.52 2.62 2.73 2.52 2.57 2.86 2.95 2.99 3.06
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 296.00 MB 384.00 MB
gloo_speed(GB/s) 1.03 1.74 2.13 2.48 2.64 2.71 2.95 2.76 3.23 3.02 3.21 3.07 3.07 3.19 3.58 3.11 2.70 2.75 1.22 2.77 1.80 3.15 2.79 0.96 1.96 2.88 2.92 3.63 2.95 0.83 1.76

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1254.55 73738.50 74606.06 111412.86 111713.27 47051.53
decode p25 p50 p75 p95 p99 mean
latency(ms) 61.14 77.04 108.73 386.80 972.31 120.90

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 786.54 1576.64 25147.86 59108.67 131573.59 15427.10
decode p25 p50 p75 p95 p99 mean
latency(ms) 70.35 111.49 208.65 1710.54 37194.78 1173.73

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 240.00 MB 256.00 MB 280.00 MB 336.00 MB 344.00 MB 376.00 MB 528.00 MB
rayrpc_speed(GB/s) 1.02 1.51 1.75 1.89 2.04 2.10 2.14 2.18 2.25 2.31 2.36 2.29 2.41 2.42 2.38 2.50 2.41 2.47 2.41 2.56 2.43 2.52 2.52 2.58 2.47 2.56 2.89 2.61 2.70 3.09 2.65 2.45 3.08 3.08 3.26 3.29
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 280.00 MB 312.00 MB
gloo_speed(GB/s) 1.00 1.69 2.17 2.35 2.54 2.91 2.94 3.08 3.25 2.96 2.71 3.18 3.35 3.36 3.49 3.10 2.65 1.89 2.78 1.60 2.93 1.49 1.81 0.92 3.11 2.48 1.61 2.05 3.24 0.74 2.10

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 256.00 MB 280.00 MB 352.00 MB 424.00 MB 456.00 MB 472.00 MB
rayrpc_speed(GB/s) 1.03 1.56 1.81 1.97 2.05 2.06 2.13 2.10 2.22 2.17 2.29 2.30 2.36 2.41 2.36 2.41 2.39 2.44 2.62 2.57 2.51 2.38 2.43 2.37 2.65 2.51 2.76 2.70 2.57 2.90 3.23 3.03 3.06 3.04
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 384.00 MB
gloo_speed(GB/s) 1.01 1.66 2.13 2.34 2.68 2.66 2.89 2.93 2.92 3.32 3.23 3.16 3.21 3.25 3.23 2.70 2.70 2.46 2.83 2.98 2.25 2.75 2.68 1.51 3.28 2.38 3.15 2.37 1.04

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1360.41 73062.84 83483.45 108908.85 112366.63 48448.47
decode p25 p50 p75 p95 p99 mean
latency(ms) 69.99 85.89 137.07 910.86 2536.53 226.74

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 883.24 37880.01 44144.61 77063.72 122837.83 30844.74
decode p25 p50 p75 p95 p99 mean
latency(ms) 57.70 66.03 91.19 273.51 316.21 95.89

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 256.00 MB 264.00 MB 272.00 MB 312.00 MB 328.00 MB 464.00 MB 568.00 MB
rayrpc_speed(GB/s) 1.03 1.54 1.77 1.93 2.04 2.07 2.12 2.17 2.21 2.22 2.32 2.32 2.38 2.41 2.42 2.51 2.46 2.54 2.49 2.45 2.36 2.59 2.66 2.51 2.48 2.16 2.59 2.68 2.57 2.67 2.85 2.83 2.89 3.25 3.26
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 320.00 MB 336.00 MB 400.00 MB
gloo_speed(GB/s) 1.03 1.71 2.15 2.34 2.62 2.79 2.93 3.03 2.95 3.19 2.91 3.43 3.59 3.42 3.32 2.60 2.71 2.55 2.25 2.78 2.80 2.36 1.44 4.78 3.86 0.95 3.21 3.07 3.03 3.27 2.27 0.60

@@ -42,9 +42,6 @@ def add_argument(self, *args, **kwargs):
kwargs['default'] = None
super().add_argument(*args, **kwargs)


# All the default values of llumnix arguments are set in default.py. So all the arguments here are set to None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add back

self.port_offset += 1
if self.enable_port_offset_store:
put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset)
return config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think next_instance_args is better, config is strange.

total_num_prefill = total_num_prefill - base_num_ratio * pd_ratio[0]
total_num_decode = total_num_decode - base_num_ratio * pd_ratio[1]

if total_num_prefill + total_num_decode == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these pd ratio codes should be placed in a seperate function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i think these logic is quite indirect, why not just calculate the pd_ratio_if_add_prefill and pd_ratio_if_add_decode, distance is hard to understand.

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 224.00 MB 232.00 MB 248.00 MB 288.00 MB 296.00 MB 312.00 MB 352.00 MB 472.00 MB 544.00 MB
rayrpc_speed(GB/s) 0.90 1.40 1.70 1.89 2.01 2.05 2.22 2.16 2.26 2.34 2.35 2.39 2.30 2.19 2.46 2.47 2.52 2.51 2.54 2.72 2.65 2.65 2.47 2.56 2.68 2.63 2.86 2.58 2.86 2.79 3.13 3.18 3.25 3.14
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 224.00 MB 232.00 MB 280.00 MB 360.00 MB
gloo_speed(GB/s) 0.88 1.47 1.84 2.06 2.23 2.32 2.42 2.47 2.67 2.61 2.62 2.74 2.57 2.85 2.78 2.31 2.40 2.60 2.66 2.19 2.91 2.64 1.83 1.30 1.05 1.55 2.25 2.66
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 200.00 MB 216.00 MB 232.00 MB 288.00 MB 320.00 MB 432.00 MB 472.00 MB
nccl_speed(GB/s) 0.21 0.43 0.65 0.85 0.88 1.23 1.43 1.65 1.58 1.79 1.69 2.07 2.07 2.08 2.17 2.24 2.49 3.08 3.00 2.67 2.68 3.41 3.39 4.28 3.12 4.16 0.75 1.21 0.88 1.64

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 1945.22 2642.31 4189.38 79652.30 116513.86 16771.80
decode p25 p50 p75 p95 p99 mean
latency(ms) 81.61 123.04 274.07 1822.79 26643.61 1333.40

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 248.00 MB 280.00 MB 320.00 MB 360.00 MB 472.00 MB 480.00 MB
rayrpc_speed(GB/s) 0.90 1.38 1.66 1.83 1.92 2.02 2.14 2.18 2.28 2.26 2.33 2.35 2.21 2.30 2.38 2.35 2.40 2.52 2.51 2.61 2.57 2.59 2.59 2.47 2.70 2.69 2.61 2.62 2.52 2.91 2.90 2.96 3.13 3.17
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 240.00 MB 280.00 MB 312.00 MB 496.00 MB
gloo_speed(GB/s) 0.90 1.47 1.84 2.07 2.18 2.42 2.54 2.53 2.56 2.37 2.60 2.61 2.85 2.79 2.74 2.54 2.31 2.54 2.22 2.65 2.48 0.77 1.79 2.29 1.13 1.04 2.59 2.58 2.23 0.70 1.68
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 240.00 MB 248.00 MB 280.00 MB 336.00 MB
nccl_speed(GB/s) 0.20 0.42 0.71 0.84 0.98 1.19 1.37 1.42 1.63 1.72 2.01 1.97 2.51 2.00 2.01 2.41 1.36 2.97 2.76 3.09 2.57 2.96 2.93 4.00 3.33 2.46 2.73 3.34 3.86 2.92 0.38 2.85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants