[Core] Increase the instance type when scaling up llumlet #87

KuilongCui · 2024-12-17T10:57:34Z

move the instance level args from ManagerArgs to InstanceArgs
move launch functions from manager to new file launcher.py
Increase the instance type when scaling up llumlet ['prefill', 'decode', 'no_constraints']
Remove --num-dispatched-instance parameters，use --pd-ratio to determine the instance type when launching instance under global launch context.
compute dispatch_load_metric and migration_load_metric in llumlet.
add pd-disadd test for bench_test and correctness_test

github-actions · 2024-12-17T12:00:46Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	17710.80	67384.10	138713.98	166019.16	190160.06	76094.19

decode	p25	p50	p75	p95	p99	mean
latency(ms)	50.63	54.32	64.30	126.98	484.67	79.10

github-actions · 2024-12-17T12:24:14Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	232.00 MB	240.00 MB	280.00 MB	400.00 MB	544.00 MB
rpc_speed(GB/s)	1.03	1.51	1.78	1.90	1.97	2.08	2.06	2.16	2.21	2.16	2.33	2.30	2.36	2.40	2.38	2.37	2.38	2.46	2.43	2.45	2.59	2.55	2.45	2.47	2.50	2.51	2.61	2.71	2.81	3.10	3.23

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	224.00 MB	256.00 MB	336.00 MB	384.00 MB	400.00 MB	424.00 MB
gloo_speed(GB/s)	1.00	1.65	2.04	2.33	2.49	2.60	2.82	2.80	3.07	3.00	3.37	3.08	3.30	2.97	3.56	2.94	2.77	2.88	2.49	2.34	2.33	3.07	2.50	3.12	1.44	2.57	3.22	1.13	1.13	0.44	0.87

github-actions · 2024-12-18T11:22:57Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	13675.20	74578.78	141961.76	175850.15	176266.63	80459.01

decode	p25	p50	p75	p95	p99	mean
latency(ms)	50.07	55.45	66.92	98.11	143.64	62.62

github-actions · 2024-12-18T11:44:51Z

migration_size	1.41 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	280.00 MB	296.00 MB	464.00 MB
rayrpc_speed(GB/s)	3.59	1.04	1.57	1.79	1.95	2.00	2.12	2.11	2.23	2.23	2.23	2.33	2.32	2.40	2.41	2.30	2.53	2.50	2.48	2.40	2.47	2.56	2.59	2.53	2.54	2.53	2.75	2.65	2.71	2.62	2.72	3.28

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	216.00 MB	232.00 MB	264.00 MB	464.00 MB
gloo_speed(GB/s)	1.01	1.72	2.24	2.42	2.49	2.60	2.84	3.00	2.91	3.32	2.88	2.93	3.55	3.28	3.20	3.15	3.43	2.89	2.18	2.80	2.43	1.95	1.64	2.88	2.21	3.47	1.79	3.28	1.08

github-actions · 2025-01-16T07:57:17Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	248.00 MB	256.00 MB	264.00 MB	272.00 MB	304.00 MB	368.00 MB	376.00 MB	544.00 MB	592.00 MB
rayrpc_speed(GB/s)	3.87	1.03	1.50	1.75	1.89	2.01	2.03	2.10	2.19	2.15	2.25	2.26	2.34	2.32	2.40	2.40	2.37	2.49	2.51	2.40	2.58	2.59	2.26	2.41	2.61	2.65	2.71	2.75	2.64	2.77	2.57	2.59	2.94	2.76	2.98	3.07	3.20	3.31

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	200.00 MB	216.00 MB	224.00 MB	232.00 MB	280.00 MB	432.00 MB
gloo_speed(GB/s)	1.00	1.69	2.13	2.39	2.49	2.77	2.89	2.85	3.24	2.84	3.31	3.14	3.10	3.16	3.53	2.86	3.01	2.80	2.75	1.68	2.99	2.90	2.68	2.96	3.20	2.18	1.87	2.07	3.18

github-actions · 2025-01-16T09:32:10Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	232.00 MB	264.00 MB	272.00 MB	280.00 MB	336.00 MB	368.00 MB	472.00 MB	528.00 MB
rayrpc_speed(GB/s)	3.92	1.04	1.54	1.79	1.95	2.06	2.13	2.17	2.18	2.34	2.31	2.33	2.34	2.37	2.45	2.41	2.57	2.75	2.39	2.59	2.58	2.29	2.81	2.49	2.37	2.27	2.81	2.50	2.58	2.47	2.57	3.28	3.05	3.47	3.52

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	272.00 MB
gloo_speed(GB/s)	1.01	1.63	2.10	2.33	2.56	2.86	3.07	2.84	3.11	3.08	3.38	3.31	3.32	3.20	3.33	2.71	2.78	2.91	3.01	2.14	3.22	2.26	2.16	2.71	2.97	2.77	2.54	1.72	1.10

github-actions · 2025-01-16T10:13:51Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	11542.56	72093.38	130429.15	171295.53	174903.31	77338.29

decode	p25	p50	p75	p95	p99	mean
latency(ms)	51.47	56.53	67.24	115.99	202.05	65.44

github-actions · 2025-01-16T12:06:06Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	16704.73	82229.98	133101.58	174211.54	174502.70	79464.72

decode	p25	p50	p75	p95	p99	mean
latency(ms)	51.01	55.00	64.99	100.52	152.96	63.68

github-actions · 2025-01-16T12:20:30Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	248.00 MB	256.00 MB	264.00 MB	352.00 MB	392.00 MB	464.00 MB	520.00 MB	536.00 MB
rayrpc_speed(GB/s)	3.60	1.05	1.53	1.74	1.94	2.04	2.13	2.17	2.20	2.28	2.33	2.36	2.40	2.34	2.45	2.38	2.47	2.54	2.60	2.50	2.52	2.66	2.47	2.55	2.60	2.49	2.52	2.61	2.81	2.65	2.75	2.86	2.91	3.09	3.17	3.32	3.38	3.52

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	248.00 MB	264.00 MB	400.00 MB
gloo_speed(GB/s)	1.04	1.81	2.21	2.51	2.67	2.90	2.96	3.08	3.25	3.17	3.30	3.48	3.56	4.03	3.30	3.01	3.25	2.29	2.09	2.99	2.62	3.61	3.00	3.27	3.56	1.97	2.51	3.37	2.99	1.03

github-actions · 2025-01-16T13:01:01Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	272.00 MB	280.00 MB	296.00 MB	304.00 MB	352.00 MB	472.00 MB	504.00 MB	544.00 MB	680.00 MB
rayrpc_speed(GB/s)	3.54	1.01	1.52	1.73	1.87	1.98	2.02	2.10	2.13	2.21	2.18	2.25	2.27	2.36	2.36	2.37	2.32	2.37	2.43	2.44	2.47	2.52	2.41	2.58	2.61	2.34	2.61	2.56	2.69	2.60	2.49	2.60	2.66	2.87	3.01	3.20	3.37	3.26	3.31

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	264.00 MB	304.00 MB	312.00 MB	376.00 MB	416.00 MB	544.00 MB
gloo_speed(GB/s)	1.04	1.74	2.28	2.54	2.65	2.79	2.91	3.06	3.09	3.19	3.19	2.82	3.13	3.19	3.11	3.27	3.18	2.86	3.60	3.41	2.27	3.47	0.85	3.26	3.32	4.73	3.20	2.53	1.36	0.83	1.71	1.49	3.20	3.70

github-actions · 2025-01-16T13:19:51Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1387.12	58432.06	59514.49	103447.36	106400.45	39682.36

decode	p25	p50	p75	p95	p99	mean
latency(ms)	61.62	80.28	110.46	657.17	1320.23	154.27

github-actions · 2025-01-16T16:29:05Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1339.67	82653.28	83803.09	108308.84	119539.22	50196.22

decode	p25	p50	p75	p95	p99	mean
latency(ms)	59.74	79.13	119.59	906.66	2841.47	220.09

github-actions · 2025-01-16T16:39:36Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	256.00 MB	280.00 MB	472.00 MB
rayrpc_speed(GB/s)	3.46	1.06	1.60	1.81	1.98	2.03	2.13	2.21	2.26	2.26	2.30	2.27	2.46	2.30	2.46	2.45	2.48	2.53	2.67	2.46	2.58	2.63	2.54	2.61	2.76	2.73	2.71	2.77	2.79	2.62	2.58	3.32

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	192.00 MB	200.00 MB	208.00 MB	232.00 MB	248.00 MB	320.00 MB	440.00 MB
gloo_speed(GB/s)	1.05	1.77	2.22	2.49	2.64	2.86	3.14	3.19	3.24	3.16	3.38	3.15	3.49	3.77	3.43	3.14	2.94	2.99	3.40	2.89	2.14	2.28	1.50	2.44	2.89	2.40	0.71	4.42	1.35

KuilongCui · 2025-01-17T10:12:15Z

@zhypku launcher.py is created to extract the launch logic from manager.py. class Launcher within launcher.py is used to initialize the server, initialize instances, and manage placement groups. I'm not sure if this naming follows conventional practices; do you have any suggestions?

github-actions · 2025-01-17T10:30:57Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	256.00 MB	280.00 MB	304.00 MB	344.00 MB	384.00 MB	480.00 MB	544.00 MB
rayrpc_speed(GB/s)	1.02	1.52	1.77	1.92	2.10	2.09	2.17	2.22	2.26	2.38	2.37	2.37	2.46	2.39	2.40	2.48	2.51	2.48	2.53	2.63	2.57	2.68	2.83	2.72	2.63	2.61	2.60	2.73	2.56	2.89	2.74	3.07	3.19	3.20	3.32

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	248.00 MB	304.00 MB	360.00 MB	480.00 MB
gloo_speed(GB/s)	1.02	1.66	2.09	2.32	2.59	2.57	2.87	2.97	2.94	2.92	3.31	3.31	3.32	3.23	3.07	2.70	3.06	3.17	2.49	2.91	2.72	3.20	2.97	3.24	1.91	2.97	3.24	3.70	3.26	2.75	1.37	2.59

github-actions · 2025-01-17T10:53:57Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1230.77	55910.33	64405.19	93182.27	117636.98	40536.96

decode	p25	p50	p75	p95	p99	mean
latency(ms)	61.25	75.36	103.41	298.65	1164.00	118.85

docs/Arguments.md

llumnix/arg_utils.py

llumnix/entrypoints/setup.py

llumnix/global_scheduler/dispatch_scheduler.py

s5u13b · 2025-01-20T02:40:14Z

llumnix/launcher.py

+                instance_type = InstanceType.DECODE
+            else:
+                # compute distance if launch prefill or decode
+                normal_distance = pd_ratio[0] - pd_ratio[1]


the pd ratio should be the abosolute value of number of prefill/decode instance? or it's just a ratio?

just a ratio, like "1:2"

s5u13b · 2025-01-20T02:45:32Z

llumnix/manager.py

-                await server.run.remote(manager, instance_id, instance)
-                self.scale_up(instance_id, instance)
+                pending_pg_states = list_placement_groups(filters=[("state", "=", "PENDING")])
+                rescheduling_pg_states = list_placement_groups(filters=[("state", "=", "RESCHEDULING")])


there should not be rescheduling pg in normal states

if rescheduling_pg_states is not 0, just wait next round and do nothing here

llumnix/manager.py

tests/e2e_test/test_correctness.py

s5u13b · 2025-01-20T02:50:17Z

tests/e2e_test/utils.py

        f"--max-num-batched-tokens {max_num_batched_tokens} "
        f"{'> instance_'+result_filename if len(result_filename)> 0 else ''} 2>&1 &"
    )
    return command

 def generate_serve_command(result_filename: str = "",
+                           launch_ray_cluster: bool = True,


there is no need of launch ray cluster here.

just maintain consistency to generate_launch_command

global launch always don't launch ray cluster. adding it for consistency is not resonable.

I think always launching ray cluster is more common case in pratical

s5u13b · 2025-01-21T08:17:17Z

llumnix/launcher.py

+            self.port_offset += 1
+            if self.enable_port_offset_store:
+                put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset)
+        return config


why naming config here

Using 'new_instance_args' is also acceptable, but it can be quite similar to the 'instance_args' being passed, so name it 'config'.

i think next_instance_args is better, config is strange.

s5u13b · 2025-01-21T08:17:30Z

llumnix/launcher.py

+        cur_num_decode = len(self.global_scheduler.instance_id_set -
+                                self.global_scheduler.dispatch_scheduler.available_dispatch_instance_set)
+        config.instance_type = self._get_next_instance_type(cur_num_prefill, cur_num_decode, self.pd_ratio)
+        return config


why naming config here

s5u13b · 2025-01-21T08:17:55Z

llumnix/launcher.py

+
+    def _get_next_instance_args(self, instance_args) -> InstanceArgs:
+        assert not self.enablde_engine_pd_disagg, \
+            "currently not support engine based pd-disaggregation in Global Launch Model."


global launch mode

github-actions · 2025-01-21T08:31:25Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	240.00 MB	248.00 MB	256.00 MB	272.00 MB	288.00 MB	296.00 MB	384.00 MB
rayrpc_speed(GB/s)	1.05	1.52	1.77	1.89	2.02	2.08	2.15	2.14	2.25	2.24	2.32	2.35	2.35	2.39	2.37	2.38	2.48	2.46	2.43	2.59	2.63	2.56	2.62	2.58	2.52	2.62	2.73	2.52	2.57	2.86	2.95	2.99	3.06

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	216.00 MB	224.00 MB	232.00 MB	256.00 MB	296.00 MB	384.00 MB
gloo_speed(GB/s)	1.03	1.74	2.13	2.48	2.64	2.71	2.95	2.76	3.23	3.02	3.21	3.07	3.07	3.19	3.58	3.11	2.70	2.75	1.22	2.77	1.80	3.15	2.79	0.96	1.96	2.88	2.92	3.63	2.95	0.83	1.76

github-actions · 2025-01-21T08:55:38Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1254.55	73738.50	74606.06	111412.86	111713.27	47051.53

decode	p25	p50	p75	p95	p99	mean
latency(ms)	61.14	77.04	108.73	386.80	972.31	120.90

github-actions · 2025-01-21T09:20:31Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	786.54	1576.64	25147.86	59108.67	131573.59	15427.10

decode	p25	p50	p75	p95	p99	mean
latency(ms)	70.35	111.49	208.65	1710.54	37194.78	1173.73

github-actions · 2025-01-21T09:20:44Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	240.00 MB	256.00 MB	280.00 MB	336.00 MB	344.00 MB	376.00 MB	528.00 MB
rayrpc_speed(GB/s)	1.02	1.51	1.75	1.89	2.04	2.10	2.14	2.18	2.25	2.31	2.36	2.29	2.41	2.42	2.38	2.50	2.41	2.47	2.41	2.56	2.43	2.52	2.52	2.58	2.47	2.56	2.89	2.61	2.70	3.09	2.65	2.45	3.08	3.08	3.26	3.29

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	216.00 MB	224.00 MB	232.00 MB	256.00 MB	280.00 MB	312.00 MB
gloo_speed(GB/s)	1.00	1.69	2.17	2.35	2.54	2.91	2.94	3.08	3.25	2.96	2.71	3.18	3.35	3.36	3.49	3.10	2.65	1.89	2.78	1.60	2.93	1.49	1.81	0.92	3.11	2.48	1.61	2.05	3.24	0.74	2.10

github-actions · 2025-01-21T11:59:59Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	256.00 MB	280.00 MB	352.00 MB	424.00 MB	456.00 MB	472.00 MB
rayrpc_speed(GB/s)	1.03	1.56	1.81	1.97	2.05	2.06	2.13	2.10	2.22	2.17	2.29	2.30	2.36	2.41	2.36	2.41	2.39	2.44	2.62	2.57	2.51	2.38	2.43	2.37	2.65	2.51	2.76	2.70	2.57	2.90	3.23	3.03	3.06	3.04

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	384.00 MB
gloo_speed(GB/s)	1.01	1.66	2.13	2.34	2.68	2.66	2.89	2.93	2.92	3.32	3.23	3.16	3.21	3.25	3.23	2.70	2.70	2.46	2.83	2.98	2.25	2.75	2.68	1.51	3.28	2.38	3.15	2.37	1.04

github-actions · 2025-01-21T12:23:55Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1360.41	73062.84	83483.45	108908.85	112366.63	48448.47

decode	p25	p50	p75	p95	p99	mean
latency(ms)	69.99	85.89	137.07	910.86	2536.53	226.74

github-actions · 2025-01-21T12:50:40Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	883.24	37880.01	44144.61	77063.72	122837.83	30844.74

decode	p25	p50	p75	p95	p99	mean
latency(ms)	57.70	66.03	91.19	273.51	316.21	95.89

github-actions · 2025-01-21T13:11:39Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	256.00 MB	264.00 MB	272.00 MB	312.00 MB	328.00 MB	464.00 MB	568.00 MB
rayrpc_speed(GB/s)	1.03	1.54	1.77	1.93	2.04	2.07	2.12	2.17	2.21	2.22	2.32	2.32	2.38	2.41	2.42	2.51	2.46	2.54	2.49	2.45	2.36	2.59	2.66	2.51	2.48	2.16	2.59	2.68	2.57	2.67	2.85	2.83	2.89	3.25	3.26

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	320.00 MB	336.00 MB	400.00 MB
gloo_speed(GB/s)	1.03	1.71	2.15	2.34	2.62	2.79	2.93	3.03	2.95	3.19	2.91	3.43	3.59	3.42	3.32	2.60	2.71	2.55	2.25	2.78	2.80	2.36	1.44	4.78	3.86	0.95	3.21	3.07	3.03	3.27	2.27	0.60

s5u13b · 2025-01-23T09:25:46Z

llumnix/arg_utils.py

@@ -42,9 +42,6 @@ def add_argument(self, *args, **kwargs):
                kwargs['default'] = None
        super().add_argument(*args, **kwargs)

-
-# All the default values of llumnix arguments are set in default.py. So all the arguments here are set to None.


why removed

s5u13b · 2025-01-23T09:32:33Z

llumnix/launcher.py

+            self.port_offset += 1
+            if self.enable_port_offset_store:
+                put_actor_data_to_ray_internal_kv("manager", "port_offset", self.port_offset)
+        return config


i think next_instance_args is better, config is strange.

s5u13b · 2025-01-23T09:34:30Z

llumnix/launcher.py

+                total_num_prefill = total_num_prefill - base_num_ratio * pd_ratio[0]
+                total_num_decode = total_num_decode - base_num_ratio * pd_ratio[1]
+
+                if total_num_prefill + total_num_decode == 0:


i think these pd ratio codes should be placed in a seperate function.

and i think these logic is quite indirect, why not just calculate the pd_ratio_if_add_prefill and pd_ratio_if_add_decode, distance is hard to understand.

github-actions · 2025-01-23T09:46:41Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	224.00 MB	232.00 MB	248.00 MB	288.00 MB	296.00 MB	312.00 MB	352.00 MB	472.00 MB	544.00 MB
rayrpc_speed(GB/s)	0.90	1.40	1.70	1.89	2.01	2.05	2.22	2.16	2.26	2.34	2.35	2.39	2.30	2.19	2.46	2.47	2.52	2.51	2.54	2.72	2.65	2.65	2.47	2.56	2.68	2.63	2.86	2.58	2.86	2.79	3.13	3.18	3.25	3.14

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	224.00 MB	232.00 MB	280.00 MB	360.00 MB
gloo_speed(GB/s)	0.88	1.47	1.84	2.06	2.23	2.32	2.42	2.47	2.67	2.61	2.62	2.74	2.57	2.85	2.78	2.31	2.40	2.60	2.66	2.19	2.91	2.64	1.83	1.30	1.05	1.55	2.25	2.66

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	200.00 MB	216.00 MB	232.00 MB	288.00 MB	320.00 MB	432.00 MB	472.00 MB
nccl_speed(GB/s)	0.21	0.43	0.65	0.85	0.88	1.23	1.43	1.65	1.58	1.79	1.69	2.07	2.07	2.08	2.17	2.24	2.49	3.08	3.00	2.67	2.68	3.41	3.39	4.28	3.12	4.16	0.75	1.21	0.88	1.64

github-actions · 2025-01-23T10:29:07Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	1945.22	2642.31	4189.38	79652.30	116513.86	16771.80

decode	p25	p50	p75	p95	p99	mean
latency(ms)	81.61	123.04	274.07	1822.79	26643.61	1333.40

github-actions · 2025-01-23T11:06:01Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	248.00 MB	280.00 MB	320.00 MB	360.00 MB	472.00 MB	480.00 MB
rayrpc_speed(GB/s)	0.90	1.38	1.66	1.83	1.92	2.02	2.14	2.18	2.28	2.26	2.33	2.35	2.21	2.30	2.38	2.35	2.40	2.52	2.51	2.61	2.57	2.59	2.59	2.47	2.70	2.69	2.61	2.62	2.52	2.91	2.90	2.96	3.13	3.17

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	240.00 MB	280.00 MB	312.00 MB	496.00 MB
gloo_speed(GB/s)	0.90	1.47	1.84	2.07	2.18	2.42	2.54	2.53	2.56	2.37	2.60	2.61	2.85	2.79	2.74	2.54	2.31	2.54	2.22	2.65	2.48	0.77	1.79	2.29	1.13	1.04	2.59	2.58	2.23	0.70	1.68

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	240.00 MB	248.00 MB	280.00 MB	336.00 MB
nccl_speed(GB/s)	0.20	0.42	0.71	0.84	0.98	1.19	1.37	1.42	1.63	1.72	2.01	1.97	2.51	2.00	2.01	2.41	1.36	2.97	2.76	3.09	2.57	2.96	2.93	4.00	3.33	2.46	2.73	3.34	3.86	2.92	0.38	2.85

AlibabaPAI deleted a comment from github-actions bot Dec 17, 2024

KuilongCui changed the title ~~[Core] Increase the instance type argument when scaling up~~ [Core] Increase the instance type when scaling up llumlet Dec 18, 2024

KuilongCui force-pushed the engine_type branch 3 times, most recently from 3a172f3 to 018bb3b Compare December 18, 2024 11:11

KuilongCui requested review from s5u13b and zhypku December 18, 2024 11:29

s5u13b mentioned this pull request Jan 7, 2025

[Deployment] Support global launch in addition to local launch #88

Merged

KuilongCui force-pushed the engine_type branch 2 times, most recently from 5c4fbaa to af205ca Compare January 16, 2025 07:29

KuilongCui force-pushed the engine_type branch from af205ca to 2be196e Compare January 16, 2025 09:07

KuilongCui force-pushed the engine_type branch from 6c40c99 to d23862a Compare January 16, 2025 11:26

KuilongCui force-pushed the engine_type branch from b30b86f to 88c3102 Compare January 16, 2025 12:37

zhypku approved these changes Jan 20, 2025

View reviewed changes

s5u13b reviewed Jan 20, 2025

View reviewed changes

KuilongCui force-pushed the engine_type branch from 169e4cd to 9ee73aa Compare January 21, 2025 07:30

s5u13b reviewed Jan 21, 2025

View reviewed changes

Xinyi-ECNU and others added 2 commits January 23, 2025 08:58

[Core] Increase the instance type when scaling up llumlet

408f2f5

fix readme

bcce670

KuilongCui force-pushed the engine_type branch from 6c1ef67 to bcce670 Compare January 23, 2025 09:07

s5u13b reviewed Jan 23, 2025

View reviewed changes

fix comment

2432b3b

[Core] Increase the instance type when scaling up llumlet #87

Are you sure you want to change the base?

[Core] Increase the instance type when scaling up llumlet #87

Conversation

KuilongCui commented Dec 17, 2024 • edited Loading

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 18, 2024

github-actions bot commented Dec 18, 2024

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 16, 2025

KuilongCui commented Jan 17, 2025

github-actions bot commented Jan 17, 2025

github-actions bot commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KuilongCui Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

KuilongCui commented Dec 17, 2024 •

edited

Loading

KuilongCui Jan 21, 2025 •

edited

Loading