forked from mlc-ai/mlc-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Serve] MicroServing API refactor (mlc-ai#3071)
This PR refactors the MicroServing REST API. With this PR, we now have all the microserving REST APIs under file `python/mlc_llm/serve/entrypoints/microserving_entrypoints.py`. And relative protocol data structures are placed under `python/mlc_llm/protocol/microserving_protocol.py`. These REST APIs essentially wrap and redirect to the OpenAI `v1/completions` API. Besides, this PR applies some API name renaming to be consistent with writeups.
- Loading branch information
1 parent
88074ea
commit 8a1bfd6
Showing
15 changed files
with
288 additions
and
160 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
"""Protocols in MLC LLM for MicroServing.""" | ||
|
||
from pydantic import BaseModel | ||
|
||
from mlc_llm.protocol.openai_api_protocol import CompletionRequest | ||
|
||
|
||
class PrepRecvRequest(CompletionRequest): | ||
"""The extra request body for prep_recv request in MicroServing. | ||
Attributes | ||
---------- | ||
kv_window_end : int | ||
[0, kv_window_end] denotes the KV range of the prompt to prefill on | ||
a prefill instance. | ||
The entries of this KV range will be allocated on the decode instance. | ||
""" | ||
|
||
kv_window_end: int | ||
|
||
|
||
class PrepRecvResponse(BaseModel): | ||
"""The response body for prep_recv request in MicroServing. | ||
Attributes | ||
---------- | ||
prompt_length : int | ||
The length of the request prompt in tokens. | ||
prefix_matched_length : int | ||
The matched common prefix length on the decode instance when | ||
prefix cache is enabled, or 0 if there is no prefix cache. | ||
kv_append_metadata : str | ||
The metadata of the KV range on the destination decode instance. | ||
""" | ||
|
||
prompt_length: int | ||
prefix_matched_length: int | ||
kv_append_metadata: str | ||
|
||
|
||
class RemoteSendRequest(CompletionRequest): | ||
"""The extra request body for remote_send request in MicroServing. | ||
Attributes | ||
---------- | ||
kv_window_begin : int | ||
Denote the start of the KV range to prefill. | ||
kv_window_end : int | ||
Denote the end of the KV range to prefill. | ||
kv_append_metadata : str | ||
The metadata of the KV range on the destination decode instance. | ||
dst_group_offset : int | ||
The node group offset of the destination decode instance. | ||
""" | ||
|
||
kv_window_begin: int | ||
kv_window_end: int | ||
kv_append_metadata: str | ||
dst_group_offset: int | ||
|
||
|
||
class StartGenerateRequest(CompletionRequest): | ||
"""The extra request body for start_generate request in MicroServing. | ||
Attributes | ||
---------- | ||
kv_window_begin : int | ||
Denote the start of the KV range to prefill on the decode instance. | ||
""" | ||
|
||
kv_window_begin: int |
Oops, something went wrong.