fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

mickqian · 2025-01-25T07:19:21Z

Motivation

Address #3098

Modifications

replace the deprecated max_tokens param with a newer one: max_completion_tokens, according to here

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling.

merrymercy · 2025-01-26T02:30:53Z

Can you only update the adapter and not change other parts?

ywang96 · 2025-01-26T03:55:16Z

Much thanks for addressing #3098 so quickly and FYI I've tested this branch and it does resolve the issue!

mickqian · 2025-01-26T07:36:58Z

Can you only update the adapter and not change other parts?

Updated. Removed the frontend part

zhaochenyang20 · 2025-02-03T04:51:10Z

@mickqian mick I will take a look on this. Thanks!

zhaochenyang20 · 2025-02-03T04:56:14Z

python/sglang/lang/ir.py

        # OpenAI does not support top_k, so we drop it here
        if self.regex is not None:
            warnings.warn("Regular expression is not supported in the OpenAI backend.")
        return {
-            "max_tokens": self.max_new_tokens,
+            (


wondering what's the definition of max_tokens in non-chat model? And why we do not keep max_tokens for non-chat models in the backend?

the def of max_tokens in completion models is tokens that can be generated in the completion, pretty much the same as max_completion_tokens

Yes we should keep both params, updated

Hey. For embedding model, “max_tokens” means the max sequence length that can be processed. What if it exceeds the length? Should the sequence be truncated or throw an error? The chat model also. And, I personally think we should call it generation model and embedding model. That's what we typically call these models.

for embedding models, I think both ways will do, depending on the situation. Providing an option sounds good too

Yes, completion model and chat completion model fall into the category of generation model, I thought you were referring to completion model. The is_chat_model is used to distinguish different generation models, if I'm correct. For sglang.lang, does it involve embedding models(or did I miss something?)? If not, probably is_chat_model would suffice for generation models in backend

Okay. This should be sound to me. But, in my ideal:

def to_openai_kwargs(self, is_chat_model):

You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?

So this function is not:

def to_openai_kwargs(self):

Right? That make sense. But I prefer the latter one if we can.

yes it's supposed to be an internal property of a BaseBackend( as it directly describes the model ), and passed to SglSamplingParams for generating openai-compatible request.
While adding this field to SglSamplingParams do sounds good in some cases, I personally reckon SglSamplingParams is meant to be a model-unaware data, which can be sent to different backends, and let the actual backend decides the final openai request? Feel free to correct me.

I agree! Nice work.

zhaochenyang20

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.
Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

zhaochenyang20 · 2025-02-03T19:49:59Z

python/sglang/lang/ir.py

        # OpenAI does not support top_k, so we drop it here
        if self.regex is not None:
            warnings.warn("Regular expression is not supported in the OpenAI backend.")
        return {
-            "max_tokens": self.max_new_tokens,
+            (


Okay. This should be sound to me. But, in my ideal:

def to_openai_kwargs(self, is_chat_model):

You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?

So this function is not:

def to_openai_kwargs(self):

Right? That make sense. But I prefer the latter one if we can.

mickqian · 2025-02-04T10:08:30Z

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.

Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

make compile failed even in main branch:

$ jupyter nbconvert --to notebook --execute --inplace ./backend/openai_api_completions.ipynb \
                                --ExecutePreprocessor.timeout=600 \
                                --ExecutePreprocessor.kernel_name=python3 || exit 1; 
                                
...
AttributeError                            Traceback (most recent call last)
Cell In[9], line 58
     52 batch_details = client.batches.retrieve(batch_id=batch_job.id)
     54 print_highlight(
     55     f"Batch job details (check {i+1} / {max_checks}) // ID: {batch_details.id} // Status: {batch_details.status} // Created at: {batch_details.created_at} // Input file ID: {batch_details.input_file_id} // Output file ID: {batch_details.output_file_id}"
     56 )
     57 print_highlight(
---> 58     f"<strong>Request counts: Total: {batch_details.request_counts.total} // Completed: {batch_details.request_counts.completed} // Failed: {batch_details.request_counts.failed}</strong>"
     59 )
     61 time.sleep(3)

AttributeError: 'NoneType' object has no attribute 'total'

Is it a known issue, or the error is from my side?

mickqian · 2025-02-04T12:55:17Z

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.

Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

make compile failed even in main branch:

$ jupyter nbconvert --to notebook --execute --inplace ./backend/openai_api_completions.ipynb \
                                --ExecutePreprocessor.timeout=600 \
                                --ExecutePreprocessor.kernel_name=python3 || exit 1; 
                                
...
AttributeError                            Traceback (most recent call last)
Cell In[9], line 58
     52 batch_details = client.batches.retrieve(batch_id=batch_job.id)
     54 print_highlight(
     55     f"Batch job details (check {i+1} / {max_checks}) // ID: {batch_details.id} // Status: {batch_details.status} // Created at: {batch_details.created_at} // Input file ID: {batch_details.input_file_id} // Output file ID: {batch_details.output_file_id}"
     56 )
     57 print_highlight(
---> 58     f"<strong>Request counts: Total: {batch_details.request_counts.total} // Completed: {batch_details.request_counts.completed} // Failed: {batch_details.request_counts.failed}</strong>"
     59 )
     61 time.sleep(3)

AttributeError: 'NoneType' object has no attribute 'total'

Is it a known issue, or the error is from my side?

Fixed, and make compile test passed locally.

zhaochenyang20 · 2025-02-04T17:24:40Z

Will review it today. Stay tuned!

zhaochenyang20

Good improvment. I am wondering do openai still have max tokens for chat API right now? For chat completion api, there should only be max completion tokens. But I don't know what's for chat API.

zhaochenyang20 · 2025-02-04T20:56:19Z

docs/Makefile

@@ -14,7 +14,7 @@ help:

 # New target to compile Markdown and Jupyter Notebook files
 compile:
-	find $(SOURCEDIR) -path "*/_build/*" -prune -o -name "*.ipynb" -print | while read nb; do \
+	find $(SOURCEDIR) -path "*/$(BUILDDIR)/*" -prune -o -name "*.ipynb" -print | while read nb; do \


Why we change this? We need to set $(BUILDDIR)?

It has already been declared in line 9:

BUILDDIR = _build

zhaochenyang20 · 2025-02-04T20:59:19Z

python/sglang/lang/ir.py

        # OpenAI does not support top_k, so we drop it here
        if self.regex is not None:
            warnings.warn("Regular expression is not supported in the OpenAI backend.")
        return {
-            "max_tokens": self.max_new_tokens,
+            (


I agree! Nice work.

zhaochenyang20 · 2025-02-04T21:00:53Z

python/sglang/srt/openai_api/protocol.py

@@ -295,7 +295,12 @@ class ChatCompletionRequest(BaseModel):
    logit_bias: Optional[Dict[str, float]] = None
    logprobs: bool = False
    top_logprobs: Optional[int] = None
+    # The maximum number of tokens that can be generated in the chat completion.
+    # non-chat-completion models only


Make it a full sentence: Non-chat-completion models only have max tokens.

So chat completion models count the max_completion_tokens and chat models (not completion models) count the max_token right?

There's a nuance difference between non-chat-completion models only and Non-chat-completion models only have max tokens, I'm afraid. Changed to Only available for non-chat-completion models, is that ok?

yes, to be more specific, in openai's legacy completion api(non-chat completion models only), they only have max_tokens. In their chat-completion api, they have both two params, but:

zhaochenyang20 · 2025-02-04T21:06:41Z

python/sglang/srt/openai_api/protocol.py

+    # An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
+    # Almost the same as `max_tokens`, but for chat-completion models only
+    max_completion_tokens: Optional[int] = None


These two definitions look strange. To me, I will say:

# The maximum number of total tokens in a chat request. Note that input tokens are included.

# The maximum number of completion tokens for a chat completion request, including visible output tokens and reasoning tokens. But input tokens are not included.

mickqian · 2025-02-05T02:21:46Z

Good improvment. I am wondering do openai still have max tokens for chat API right now? For chat completion api, there should only be max completion tokens. But I don't know what's for chat API.

replied in here

CatherineSue · 2025-03-04T01:56:30Z

python/sglang/srt/openai_api/protocol.py

@@ -325,6 +330,14 @@ class ChatCompletionRequest(BaseModel):
    lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None
    session_params: Optional[Dict] = None

+    def get_max_output_tokens(self) -> int:


Should it be def get_max_output_tokens(self) -> int | None:? And from the code change, this function feels unnecessary. Does request.max_completion_tokens or request.max_tokens not work?

zhaochenyang20 · 2025-03-04T08:06:17Z

@mickqian This should be rebased. Thanks 😂

Replace it with a newer one: max_completion_tokens

zhaochenyang20 · 2025-03-04T18:05:32Z

@shuaills Could you review this? Thnaks!

mickqian requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners January 25, 2025 07:19

mickqian force-pushed the fix-max-tokens branch 4 times, most recently from 403eb6c to eed4b36 Compare January 25, 2025 08:31

yizhang2077 mentioned this pull request Jan 25, 2025

[Bug] Qwen2-VL Online Serving Issue #3098

Open

5 tasks

mickqian force-pushed the fix-max-tokens branch from eed4b36 to 91f7dba Compare January 26, 2025 06:26

mickqian force-pushed the fix-max-tokens branch from 91f7dba to b48b900 Compare January 29, 2025 01:13

zhaochenyang20 requested changes Feb 3, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch from b48b900 to b80b526 Compare February 3, 2025 05:13

zhaochenyang20 requested changes Feb 3, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch from b80b526 to 82c817b Compare February 4, 2025 12:53

mickqian force-pushed the fix-max-tokens branch from 82c817b to e9aa94a Compare February 4, 2025 12:56

zhaochenyang20 requested changes Feb 4, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch 3 times, most recently from 10ce5fd to db732ae Compare February 5, 2025 02:39

CatherineSue reviewed Mar 4, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch 3 times, most recently from 5d645ed to 473fa26 Compare March 4, 2025 09:51

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest

353c258

Replace it with a newer one: max_completion_tokens

mickqian force-pushed the fix-max-tokens branch from 473fa26 to 353c258 Compare March 4, 2025 10:19

Merge branch 'main' into fix-max-tokens

692e269

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

mickqian commented Jan 25, 2025 •

edited

Loading

merrymercy commented Jan 26, 2025

ywang96 commented Jan 26, 2025

mickqian commented Jan 26, 2025

zhaochenyang20 commented Feb 3, 2025

zhaochenyang20 Feb 3, 2025

mickqian Feb 3, 2025 •

edited

Loading

zhaochenyang20 Feb 3, 2025

mickqian Feb 3, 2025 •

edited

Loading

zhaochenyang20 Feb 3, 2025

mickqian Feb 4, 2025

zhaochenyang20 Feb 4, 2025

zhaochenyang20 left a comment

zhaochenyang20 Feb 3, 2025

mickqian commented Feb 4, 2025

mickqian commented Feb 4, 2025

zhaochenyang20 commented Feb 4, 2025

zhaochenyang20 left a comment

zhaochenyang20 Feb 4, 2025

mickqian Feb 5, 2025

zhaochenyang20 Feb 4, 2025

zhaochenyang20 Feb 4, 2025

mickqian Feb 5, 2025 •

edited

Loading

zhaochenyang20 Feb 4, 2025

mickqian Feb 5, 2025

mickqian commented Feb 5, 2025

CatherineSue Mar 4, 2025 •

edited

Loading

zhaochenyang20 commented Mar 4, 2025

zhaochenyang20 commented Mar 4, 2025

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

Are you sure you want to change the base?

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

Conversation

mickqian commented Jan 25, 2025 • edited Loading

Motivation

Modifications

Checklist

merrymercy commented Jan 26, 2025

ywang96 commented Jan 26, 2025

mickqian commented Jan 26, 2025

zhaochenyang20 commented Feb 3, 2025

Choose a reason for hiding this comment

mickqian Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickqian Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickqian commented Feb 4, 2025

mickqian commented Feb 4, 2025

zhaochenyang20 commented Feb 4, 2025

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickqian Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickqian commented Feb 5, 2025

CatherineSue Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

zhaochenyang20 commented Mar 4, 2025

zhaochenyang20 commented Mar 4, 2025

mickqian commented Jan 25, 2025 •

edited

Loading

mickqian Feb 3, 2025 •

edited

Loading

mickqian Feb 3, 2025 •

edited

Loading

mickqian Feb 5, 2025 •

edited

Loading

CatherineSue Mar 4, 2025 •

edited

Loading