Add guided decoding to TGIS gRPC API #31

njhill · 2024-05-22T16:09:32Z

Within the existing decoding request parameter section:

enum ResponseFormat {
  // Plain text, no constraints
  TEXT = 0;
  // Valid json
  JSON = 1;
}

message StringChoices {
  repeated string choices = 1;
}

// Mutually-exclusive guided decoding options
oneof guided {
  // Output will be in the specified format
  ResponseFormat format = 3;
  // Output will follow the provided JSON schema
  string json_schema = 4;
  // Output will follow the provided regex pattern
  string regex = 5;
  // Output will be exactly one of the specified choices
  StringChoices choice = 6;
  // Output will follow the provided context free grammar
  string grammar = 7;
}

njhill · 2024-05-22T16:11:17Z

proto/generation.proto

+    // Output will follow the provided regex pattern
+    string regex = 5;
+    // Output will be exactly one of the specified choices
+    StringChoices choice = 6;


Unfortunately you cannot have repeated fields directly within oneofs :(

protocolbuffers/protobuf#2592 (comment)

enum ResponseFormat { // Plain text, no constraints TEXT = 0; // Valid json JSON = 1; } message StringChoices { repeated string choices = 1; } // Mutually-exclusive guided decoding options oneof guided { // Output will be in the specified format ResponseFormat format = 3; // Output will follow the provided JSON schema string json_schema = 4; // Output will follow the provided regex pattern string regex = 5; // Output will be exactly one of the specified choices StringChoices choice = 6; // Output will follow the provided context free grammar string grammar = 7; } Signed-off-by: Nick Hill <[email protected]>

joerunde · 2024-05-28T15:43:48Z

vllm/tgis_utils/guided_decoding.py

+
+    if outlines_decoding.global_thread_pool is None:
+        outlines_decoding.global_thread_pool = (
+            concurrent.futures.ThreadPoolExecutor(max_workers=2))


I haven't looked much at logits processors, why does this require its own thread pool?

It's the same code as here:

vllm/vllm/model_executor/guided_decoding/outlines_decoding.py

Line 71 in 4af59d3

global_thread_pool = concurrent.futures.ThreadPoolExecutor(

. If I'm not mistaken, only the construction of the logits processor happens in another thread. But if the logits processor is cached, I'm not sure what's the benefit of having another thread build the object.

Yes that's right. The code is just the same as that in the http API. It's dispatched to a threadpool to avoid blocking the asyncio event loop, but I think it could be made more efficient since we only care about this in the case that the LP is not already cached. In any case we can fix that as a follow-on since we need to fix that related concurrency bug anyhow.

maxdebayser · 2024-05-28T15:47:28Z

vllm/entrypoints/grpc/grpc_server.py

@@ -118,7 +120,8 @@ def __init__(self, engine: AsyncLLMEngine, args: argparse.Namespace):

    async def _post_init(self):
        self.config = await self.engine.get_model_config()
-        self.tokenizer_group = await self.engine.get_tokenizer_group()
+        # self.tokenizer_group = await self.engine.get_tokenizer_group()
+        self.tokenizer_group = self.engine.engine.tokenizer


I've seen versions of the code where the get_tokenizer_group function exists and others where it doesn't. What's happening with this function?

@maxdebayser that's from this upstream PR vllm-project/vllm#3512

It didn't get merged in a timely manner and is now buried in conflicts :(

maxdebayser

Since the bug reported in issue https://github.ibm.com/ai-foundation/fmaas-inference-server/issues/718 is not cause by the code in this PR, I think we can merge it and fix the problem in another PR.

@HTChang

Fixes issue: ``` Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/vllm/lib64/python3.9/site-packages/vllm/entrypoints/openai/rpc/server.py", line 236, in run_rpc_server server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path) File "/opt/vllm/lib64/python3.9/site-packages/vllm/entrypoints/openai/rpc/server.py", line 34, in __init__ self.engine = AsyncLLMEngine.from_engine_args( File "/opt/vllm/lib64/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 735, in from_engine_args executor_class = cls._get_executor_cls(engine_config) File "/opt/vllm/lib64/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 670, in _get_executor_cls from vllm.executor.sendnn_executor import SENDNNExecutorAsync File "/opt/vllm/lib64/python3.9/site-packages/vllm/executor/sendnn_executor.py", line 6, in <module> from vllm.sequence import ExecuteModelRequest, SamplerOutput ImportError: cannot import name 'SamplerOutput' from 'vllm.sequence' (/opt/vllm/lib64/python3.9/site-packages/vllm/sequence.py) ``` reported by @Yannick-Schnider1 and @HTChang Signed-off-by: Thomas Parnell <[email protected]>

njhill commented May 22, 2024

View reviewed changes

njhill force-pushed the guided branch 3 times, most recently from 384d566 to f9ee133 Compare May 22, 2024 16:21

njhill force-pushed the guided branch from f9ee133 to 2e49d28 Compare May 22, 2024 16:26

joerunde reviewed May 28, 2024

View reviewed changes

maxdebayser reviewed May 28, 2024

View reviewed changes

maxdebayser approved these changes May 28, 2024

View reviewed changes

njhill merged commit 3dc2819 into main May 30, 2024
14 checks passed

njhill deleted the guided branch May 30, 2024 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guided decoding to TGIS gRPC API #31

Add guided decoding to TGIS gRPC API #31

njhill commented May 22, 2024

njhill May 22, 2024

joerunde May 28, 2024

maxdebayser May 28, 2024

njhill May 30, 2024

maxdebayser May 28, 2024

joerunde May 28, 2024

maxdebayser left a comment

Add guided decoding to TGIS gRPC API #31

Add guided decoding to TGIS gRPC API #31

Conversation

njhill commented May 22, 2024

njhill May 22, 2024

Choose a reason for hiding this comment

joerunde May 28, 2024

Choose a reason for hiding this comment

maxdebayser May 28, 2024

Choose a reason for hiding this comment

njhill May 30, 2024

Choose a reason for hiding this comment

maxdebayser May 28, 2024

Choose a reason for hiding this comment

joerunde May 28, 2024

Choose a reason for hiding this comment

maxdebayser left a comment

Choose a reason for hiding this comment