Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deployment] Support global launch in addition to local launch #88

Merged
merged 92 commits into from
Jan 10, 2025

Conversation

s5u13b
Copy link
Contributor

@s5u13b s5u13b commented Dec 18, 2024

  1. Support global deployment in addition to local deployment.
  2. Refactor setup codes to uniform the global and local deployment codes.
  3. Simplify actor construction args and workflow.
  4. Rename LLMEngineManager to Manager.
  5. Refine Arguments.md.
    ...

@s5u13b s5u13b requested review from zhypku and KuilongCui December 18, 2024 03:43
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 10297.07 71613.21 123340.44 184057.92 194201.81 73009.19
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.53 56.30 68.59 123.65 202.43 69.42

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 304.00 MB 344.00 MB 352.00 MB 472.00 MB
rpc_speed(GB/s) 1.03 1.53 1.76 1.92 2.00 2.10 2.14 2.17 2.31 2.24 2.33 2.36 2.39 2.41 2.41 2.47 2.43 2.45 2.45 2.45 2.39 2.39 2.47 2.50 2.50 2.57 2.41 2.65 2.67 2.64 2.89 3.12 3.20 3.42
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 248.00 MB 560.00 MB
gloo_speed(GB/s) 1.02 1.66 2.11 2.36 2.54 2.76 2.95 2.81 3.05 3.25 2.91 3.02 3.05 3.56 2.74 2.75 2.97 3.13 3.12 3.13 1.27 2.88 3.14 2.08 2.96 1.61 3.72 1.85 0.33 2.78

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 18058.36 67213.95 133608.30 185439.59 185816.04 77123.44
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.28 53.86 61.66 98.63 152.59 61.23

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 240.00 MB 256.00 MB 280.00 MB 328.00 MB 336.00 MB 344.00 MB 560.00 MB 656.00 MB
rpc_speed(GB/s) 1.04 1.55 1.79 1.98 2.10 2.21 2.13 2.26 2.28 2.15 2.37 2.38 2.42 2.43 2.45 2.48 2.50 2.51 2.49 2.42 2.57 2.60 2.49 2.59 2.63 2.75 2.73 2.47 2.79 2.72 2.88 2.97 2.83 3.49 3.38
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 200.00 MB 208.00 MB 224.00 MB 256.00 MB
gloo_speed(GB/s) 1.02 1.74 2.14 2.43 2.60 2.70 3.00 2.77 2.96 3.00 3.27 3.38 3.24 3.50 3.12 3.06 3.02 2.46 2.51 2.04 2.55 2.40 1.02 3.27 2.28 3.58 3.01

@s5u13b s5u13b changed the title [Entrypoints] Support centralized deployment addition to distributed deployment [WIP][Entrypoints] Support centralized deployment addition to distributed deployment Dec 24, 2024
@s5u13b s5u13b changed the title [WIP][Entrypoints] Support centralized deployment addition to distributed deployment [WIP][Deployment] Support centralized deployment addition to distributed deployment Dec 24, 2024
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 3998.49 70425.43 133718.00 169955.68 171480.70 75593.56
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.97 55.10 66.81 100.68 191.12 65.51

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 248.00 MB 256.00 MB 264.00 MB 272.00 MB 280.00 MB 304.00 MB 352.00 MB 360.00 MB 512.00 MB
rayrpc_speed(GB/s) 3.39 1.05 1.56 1.80 1.93 2.08 2.15 2.03 2.25 2.30 2.28 2.28 2.34 2.35 2.45 2.42 2.43 2.55 2.39 2.51 2.50 2.56 2.63 2.47 2.23 2.44 2.52 2.56 2.79 2.78 2.76 2.79 2.68 2.69 3.07 2.99 3.01
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 264.00 MB 312.00 MB
gloo_speed(GB/s) 1.02 1.68 2.14 2.37 2.50 2.62 2.83 2.84 2.78 2.93 3.11 3.01 3.06 3.32 3.61 3.02 2.44 2.31 2.31 2.52 2.18 2.16 1.98 3.17 3.05 1.73 2.88 2.75

@s5u13b s5u13b changed the title [WIP][Deployment] Support centralized deployment addition to distributed deployment [WIP][Deployment] Support global deployment in addition to local deployment Jan 2, 2025
Copy link

github-actions bot commented Jan 2, 2025

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 240.00 MB 336.00 MB 368.00 MB 560.00 MB
rayrpc_speed(GB/s) 1.01 1.48 1.73 1.87 1.98 2.02 2.10 2.15 2.18 2.24 2.26 2.30 2.37 2.32 2.45 2.42 2.48 2.42 2.39 2.43 2.44 2.56 2.60 2.48 2.61 2.62 2.38 2.63 2.64 2.95 2.82 3.17
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 288.00 MB 312.00 MB 320.00 MB
gloo_speed(GB/s) 1.03 1.71 2.11 2.36 2.49 2.62 2.92 2.91 2.76 3.23 2.96 3.45 3.28 3.27 3.44 2.81 1.95 2.77 2.69 2.07 2.93 2.68 2.89 3.12 3.05 2.49 2.89 0.96 3.15

Copy link

github-actions bot commented Jan 2, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 15167.99 66800.67 130599.77 165961.74 179659.92 72956.17
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.32 55.90 67.64 109.20 206.03 67.37

Copy link

github-actions bot commented Jan 6, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 12551.54 82148.12 131949.20 170474.89 184565.22 76305.46
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.72 54.12 60.71 130.99 620.07 73.72

Copy link

github-actions bot commented Jan 6, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 240.00 MB 256.00 MB 288.00 MB 352.00 MB
rayrpc_speed(GB/s) 3.80 1.06 1.55 1.82 1.95 2.03 2.10 2.15 2.25 2.29 2.29 2.33 2.33 2.49 2.49 2.49 2.47 2.56 2.43 2.48 2.51 2.52 2.66 2.59 2.51 2.47 2.50 2.59 2.66 2.55 2.99 3.20
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 272.00 MB 352.00 MB 384.00 MB 432.00 MB
gloo_speed(GB/s) 1.02 1.64 2.06 2.36 2.41 2.57 2.82 2.87 3.03 3.13 3.34 3.18 3.05 3.46 3.23 3.01 2.59 2.97 2.16 2.81 2.55 2.44 2.59 2.52 2.57 2.89 3.40 3.71 2.93 1.65 1.75 1.28

Copy link

github-actions bot commented Jan 6, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 16875.52 80862.81 131262.62 169392.94 182223.13 79257.08
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.74 54.55 61.29 94.46 171.10 62.02

Copy link

github-actions bot commented Jan 6, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 240.00 MB 264.00 MB 280.00 MB 296.00 MB 304.00 MB 344.00 MB
rayrpc_speed(GB/s) 3.78 1.04 1.54 1.80 1.93 2.00 2.11 2.18 2.23 2.29 2.28 2.36 2.33 2.33 2.42 2.51 2.43 2.52 2.43 2.29 2.53 2.58 2.66 2.66 2.58 2.60 2.57 2.57 2.66 2.52 2.76 2.78 2.75 2.84 2.88
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 440.00 MB
gloo_speed(GB/s) 1.04 1.76 2.16 2.42 2.59 2.80 3.06 3.11 2.84 3.22 3.46 3.56 3.60 3.30 3.49 2.66 2.60 2.81 2.79 3.19 1.88 1.87 3.62 2.45 2.83 2.07 3.03 3.54 3.56

@s5u13b s5u13b changed the title [WIP][Deployment] Support global deployment in addition to local deployment [Deployment] Support global deployment in addition to local deployment Jan 7, 2025
Copy link

github-actions bot commented Jan 7, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 256.00 MB 264.00 MB 280.00 MB 320.00 MB 376.00 MB 544.00 MB
rayrpc_speed(GB/s) 3.81 1.06 1.56 1.81 1.95 2.05 2.11 2.13 2.22 2.31 2.28 2.37 2.40 2.35 2.51 2.45 2.47 2.61 2.53 2.45 2.57 2.58 2.67 2.69 2.64 2.55 2.59 2.64 2.72 2.99 2.93 2.97 3.18 3.62
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 232.00 MB 360.00 MB 400.00 MB
gloo_speed(GB/s) 1.01 1.68 2.09 2.35 2.46 2.62 2.92 2.67 3.02 3.21 3.29 3.17 2.81 3.32 3.28 2.94 2.34 2.13 2.94 2.42 2.08 2.90 1.95 3.49 2.24 3.11 2.66 0.92 2.76 2.06

Copy link

github-actions bot commented Jan 7, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 10337.22 65395.54 117695.16 160921.44 190857.05 70814.29
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.01 59.21 72.54 128.60 601.50 80.51

Copy link

github-actions bot commented Jan 7, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 240.00 MB 256.00 MB 280.00 MB 360.00 MB 456.00 MB 504.00 MB
rayrpc_speed(GB/s) 3.75 1.04 1.50 1.78 1.92 2.03 2.08 2.12 2.21 2.27 2.30 2.41 2.39 2.38 2.46 2.50 2.44 2.56 2.62 2.37 2.51 2.51 2.60 2.33 2.56 2.48 2.59 2.64 2.66 2.52 2.85 2.69 2.79 3.01 3.15 3.32
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 264.00 MB 312.00 MB 336.00 MB
gloo_speed(GB/s) 1.00 1.63 2.09 2.26 2.49 2.77 2.82 2.94 2.90 2.95 3.21 2.90 3.58 3.38 3.55 3.13 2.53 2.58 2.66 2.68 2.60 3.22 3.29 1.88 2.95 1.75 2.83 1.90 3.10 1.30

@s5u13b s5u13b force-pushed the centralized-deployment branch from 4d49b4f to d38315a Compare January 7, 2025 09:23
Copy link

github-actions bot commented Jan 7, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 15396.63 68470.57 126320.79 217287.80 218368.82 80017.84
decode p25 p50 p75 p95 p99 mean
latency(ms) 49.43 54.53 63.30 103.64 209.98 63.55

Copy link

github-actions bot commented Jan 7, 2025

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 264.00 MB 280.00 MB 312.00 MB 384.00 MB 400.00 MB
rayrpc_speed(GB/s) 1.04 1.50 1.75 1.90 2.01 2.10 2.12 2.17 2.18 2.24 2.33 2.32 2.37 2.36 2.41 2.43 2.44 2.50 2.44 2.58 2.53 2.49 2.56 2.61 2.68 2.53 2.55 2.58 2.76 2.83 2.84 3.15 2.92
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 304.00 MB 312.00 MB
gloo_speed(GB/s) 1.02 1.68 2.21 2.49 2.64 2.83 3.03 2.86 3.16 3.01 3.40 3.31 3.08 3.28 3.38 2.89 3.14 2.86 2.81 2.72 2.67 3.21 2.71 2.71 2.75 2.34 3.24 3.13 3.34

Copy link
Collaborator

@zhypku zhypku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you handle PD assignment? Perhaps I missed it in the code

llumnix/arg_utils.py Show resolved Hide resolved
llumnix/backends/utils.py Outdated Show resolved Hide resolved
llumnix/entrypoints/utils.py Show resolved Hide resolved
llumnix/entrypoints/vllm/serve.py Show resolved Hide resolved
llumnix/manager.py Show resolved Hide resolved
Copy link

github-actions bot commented Jan 7, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 232.00 MB 256.00 MB 280.00 MB 480.00 MB 560.00 MB
rayrpc_speed(GB/s) 3.12 1.04 1.53 1.78 1.90 2.05 2.05 2.14 2.26 2.22 2.33 2.34 2.32 2.41 2.33 2.41 2.42 2.51 2.46 2.58 2.57 2.50 2.64 2.43 2.68 2.62 2.80 2.59 2.73 2.77 2.82 3.07 3.29
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 256.00 MB 296.00 MB 352.00 MB
gloo_speed(GB/s) 0.99 1.64 2.08 2.41 2.67 2.59 2.91 3.02 2.93 3.06 3.07 3.37 2.98 3.49 3.37 2.99 2.99 2.81 3.08 2.90 1.76 2.96 2.20 2.79 3.15 3.61 3.02 3.10 2.68 2.89

Copy link

github-actions bot commented Jan 7, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 13251.14 87803.44 140237.92 170950.63 171736.41 79404.83
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.79 54.44 62.43 100.00 149.00 63.72

@s5u13b
Copy link
Contributor Author

s5u13b commented Jan 7, 2025

How do you handle PD assignment? Perhaps I missed it in the code

I have not considered it yet, this pd assignment pr is not merged, so how to handle pd assignment is not clear now.

Copy link

github-actions bot commented Jan 9, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 14869.40 80626.70 128083.39 166412.64 170044.11 79027.52
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.12 53.59 61.57 95.06 181.13 61.91

Copy link

github-actions bot commented Jan 9, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 240.00 MB 256.00 MB 288.00 MB 312.00 MB 384.00 MB 680.00 MB
rayrpc_speed(GB/s) 3.58 1.02 1.49 1.74 1.91 2.02 2.10 2.10 2.14 2.23 2.28 2.25 2.34 2.34 2.36 2.31 2.45 2.52 2.58 2.38 2.50 2.52 2.49 2.52 2.65 2.61 2.54 2.76 2.73 2.80 2.92 3.01 3.10 3.45
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 232.00 MB 248.00 MB 320.00 MB 432.00 MB 488.00 MB
gloo_speed(GB/s) 1.02 1.72 2.17 2.49 2.62 2.92 3.06 2.97 3.16 3.34 2.94 3.40 2.81 3.22 2.94 2.98 2.98 3.11 2.29 2.64 1.50 3.41 3.20 2.84 3.10 1.70 1.98 0.36 1.75 3.40 2.90

docs/Quickstart.md Outdated Show resolved Hide resolved
llumnix/entrypoints/bladellm/client.py Show resolved Hide resolved
llumnix/utils.py Show resolved Hide resolved
llumnix/utils.py Outdated Show resolved Hide resolved
@KuilongCui
Copy link
Contributor

after init_llumlet in manager, scale up the llumlets timely

llumnix/entrypoints/vllm/serve.py Outdated Show resolved Hide resolved
llumnix/entrypoints/vllm/api_server_actor.py Outdated Show resolved Hide resolved
llumnix/entrypoints/vllm/api_server_actor.py Outdated Show resolved Hide resolved
llumnix/manager.py Outdated Show resolved Hide resolved
llumnix/manager.py Outdated Show resolved Hide resolved
@s5u13b
Copy link
Contributor Author

s5u13b commented Jan 9, 2025

after init_llumlet in manager, scale up the llumlets timely

Done

Copy link

github-actions bot commented Jan 9, 2025

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 224.00 MB 272.00 MB 280.00 MB 304.00 MB 384.00 MB 472.00 MB
rayrpc_speed(GB/s) 3.48 1.06 1.53 1.79 1.95 2.02 2.07 2.11 2.19 2.19 2.33 2.32 2.36 2.41 2.39 2.43 2.49 2.43 2.30 2.50 2.50 2.66 2.63 2.52 2.67 2.61 2.72 2.91 2.82 2.83 2.90 3.19 3.27
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 232.00 MB 256.00 MB 304.00 MB 400.00 MB 456.00 MB 464.00 MB 560.00 MB
gloo_speed(GB/s) 1.06 1.75 2.23 2.54 2.65 2.93 3.16 3.20 3.09 3.22 3.25 3.52 3.30 3.64 4.04 3.26 3.02 3.40 2.80 2.77 1.11 2.12 3.07 0.95 3.13 1.24 2.90 2.40 2.19 0.40 0.63 0.94 1.54 1.48

Copy link

github-actions bot commented Jan 9, 2025

prefill p25 p50 p75 p95 p99 mean
latency(ms) 16019.47 89303.68 147522.97 189752.83 189927.35 90084.74
decode p25 p50 p75 p95 p99 mean
latency(ms) 48.62 51.81 59.01 84.82 125.86 56.72

@s5u13b
Copy link
Contributor Author

s5u13b commented Jan 10, 2025

How do you handle PD assignment? Perhaps I missed it in the code

It is compatible, global launch has no effect to the pd-disagg.

@s5u13b s5u13b changed the title [Deployment] Support global deployment in addition to local deployment [Deployment] Support global launch in addition to local launch Jan 10, 2025
Copy link
Collaborator

@zhypku zhypku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 14939.53 76460.35 126947.85 170413.80 183847.74 76538.98
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.51 55.58 65.82 98.90 187.27 64.23

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 240.00 MB 264.00 MB 280.00 MB 320.00 MB 336.00 MB 368.00 MB 472.00 MB 480.00 MB 544.00 MB 560.00 MB 656.00 MB
rayrpc_speed(GB/s) 1.03 1.51 1.78 1.90 2.05 2.05 2.21 2.13 2.20 2.30 2.29 2.36 2.44 2.45 2.43 2.45 2.48 2.61 2.63 2.58 2.64 2.52 2.48 2.70 2.38 2.59 2.53 2.58 2.59 2.72 2.69 2.53 3.14 3.05 3.22 3.32 3.31 3.46 3.28 3.48
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 248.00 MB 464.00 MB
gloo_speed(GB/s) 1.00 1.65 2.10 2.29 2.57 2.79 2.98 2.80 2.97 3.13 3.15 3.28 3.24 3.25 3.51 2.97 2.89 1.54 3.20 2.73 2.29 0.94 2.30 3.57 2.05 0.93 2.63 4.39 1.23 1.39

@s5u13b s5u13b merged commit 3e319f0 into main Jan 10, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants