Skip to content

v0.1.0

Latest
Compare
Choose a tag to compare
@zhypku zhypku released this 05 Dec 11:01
· 4 commits to main since this release
312edd8

The first official release of Llumnix

What's Changed

  • Add pylint to CI by @s5u13b in #3
  • [Misc] Improve simulator&&api_server performance by @ZeldaHuang in #6
  • [Refactor] Refactor llumlet to better adapt to different backend engines by @s5u13b in #4
  • [Misc] Change TODO id by @s5u13b in #8
  • [Misc] Rename manager arguments by @s5u13b in #9
  • [BugFix] Support initialize llumlet by manager by @s5u13b in #11
  • [CI] Add unittest for global_scheduler and entrypoints by @s5u13b in #12
  • [Core] Add Gloo and NCCL migration backends with elastic support by @KuilongCui in #7
  • [CI] Add unittest for llumlet and backends by @ZeldaHuang in #14
  • [Misc] Update instance info of engine in a timely manner by @s5u13b in #17
  • [BugFix] Add stream synchronize for nccl migration backend by @KuilongCui in #20
  • [CI] Add test for migration backend and worker by @KuilongCui in #22
  • [Failover][BugFix] Fix duplicate requests caused by failover by @s5u13b in #18
  • [Misc] Check manager and engine arguments in entrypoints by @s5u13b in #19
  • [BugFix][CI] Fix unittest errors caused by incorrect ray actors management by @s5u13b in #23
  • [Misc] Add "LlumnixRequest" class and refactor api by @ZeldaHuang in #21
  • [Fix] Fix vllm backend add_request bug by @ZeldaHuang in #26
  • [Misc] Add an example demonstrating how to run llumnix offline by @KuilongCui in #24
  • [BugFix] Fix request output loss during putting back to the api server by @s5u13b in #27
  • [Misc] Use logger instead of print in api_server by @KuilongCui in #29
  • [Misc] Enable YAML File for Llumnix Configuration by @KuilongCui in #31
  • [Misc] Improve args check in EngineManager by @KuilongCui in #32
  • [Core] Use zeromq to put request output tokens back to the api server by @s5u13b in #28
  • [BugFix] Fix incorrect config usage in api server by @s5u13b in #33
  • [Misc] Update instance info to global scheduler immediately once get one by @s5u13b in #34
  • [Bugfix] Reset migration-related parameters for the request on migration failure by @KuilongCui in #35
  • [Fix] Fix offline inference for zeromq queue by @s5u13b in #39
  • [Misc] Implement post process of migration through asynchronous task done callback by @s5u13b in #40
  • [Misc] Ensure default values originate solely from config/default.py by @KuilongCui in #37
  • [CI] Add comprehensive testing: migration, e2e, and bench by @KuilongCui in #30
  • [Fix] Migration correctness test by @ZeldaHuang in #43
  • [Misc][Simulator] Update vllm simulator backend by @ZeldaHuang in #42
  • [Core] Add back ray queue to put request output tokens back to the api server by @KuilongCui in #41
  • [Misc] Ensure Llumlet main thread exits on Engine.Step errors by @KuilongCui in #38
  • [Core] Optimize request output tokens putting back implementation to reduce overhead by @s5u13b in #45
  • [CI] move cancel_previous_workflows to ubuntu-latest by @KuilongCui in #49
  • [Misc] Catch the exception generated in llumlet constructor by @KuilongCui in #50
  • [Observability] Collect request timestamps to observe the overhead introduced by system by @s5u13b in #46
  • [Bugfix] Change num_cpu to 0 for async_put_queue_actor by @KuilongCui in #51
  • [Bugfix] enable_migration and enable_defrag cannot be set to False by @KuilongCui in #52
  • [Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature by @Xinyi-ECNU in #15
  • [Fix][Misc] vllm simulator&&migration by @ZeldaHuang in #53
  • [Refactor] Asynchronous llumlet by @ZeldaHuang in #56
  • [Entrypoints][Refactor] Refactor llumnix entrypoints to be more modular by @s5u13b in #55
  • [CI] Add launch modes and available blocks tests in e2e test by @s5u13b in #57
  • [Fix] Add node id to BackendSim by @ZeldaHuang in #64
  • [Misc] Add e2e test for prefill-decoding migration by @Xinyi-ECNU in #65
  • [Core] Support one-to-many and many-to-one migration by @KuilongCui in #63
  • [Core][Migration] Support waiting request and multiple requests migration by @s5u13b in #36
  • [Core] Add RoundRobin dispatch policy by @KuilongCui in #70
  • [Refactor] refactor migration scheduler by @KuilongCui in #66
  • [Misc] Usage Doc for Prefill-decoding Disaggregation by @Xinyi-ECNU in #71
  • [Doc] v0.1.0 release by @zhypku in #72

New Contributors

Full Changelog: https://github.com/AlibabaPAI/llumnix/commits/v0.1.0