[Core] Faster logit_bias_logits_processor #13334

xu-song · 2025-02-15T13:26:35Z

This PR changes python ops to tensor ops, which reduce time cost from 106ms to 0.01ms.

Before

    for token_id, bias in logit_bias.items():
        logits[token_id] += bias

The above approach is time consuming especially when len(logit_bias) is very large.

After

    logits.index_add_(0, logit_bias["index"], logit_bias["value"])

Time Cost

before -> v1 -> v2

len(logit_bias)	time cost (ms)
1	4.5 -> 0.3 -> 0.01
20	4.5 -> 0.3 -> 0.01
100	5.3 -> 0.3 -> 0.01
1000	14.4 -> 0.3 -> 0.01
10000	106 -> 0.4 -> 0.01

experiment settings:

GPU: A100 
model: Llama-3.2-1B-Instruct

impl history

v1: no cache

vllm/vllm/entrypoints/openai/logits_processors.py

Lines 50 to 51 in a50b0ea

    
           logits.index_add_(0, logit_bias["index"].to(logits.device), 
        
                             logit_bias["value"].to(logits.device))

106ms -> 0.4ms

v2: cache in single request:

vllm/vllm/entrypoints/openai/logits_processors.py

Lines 50 to 53 in 62a74e3

    
           if logit_bias["value"].device != logits.device: 
        
               logit_bias["index"] = logit_bias["index"].to(logits.device) 
        
               logit_bias["value"] = logit_bias["value"].to(logits.device) 
        
           logits.index_add_(0, logit_bias["index"], logit_bias["value"])

106ms -> 0.01ms

v3(current): cache across different requests

vllm/vllm/entrypoints/openai/logits_processors.py

Lines 70 to 74 in cd9f33f

    
           @lru_cache(maxsize=32) 
        
           def _get_logit_bias_logits_processor( 
        
               logit_bias_index: Union[Tuple[int], Tuple[str]], 
        
               logit_bias_value: Tuple[float], 
        
               vocab_size: int,

github-actions · 2025-02-15T13:26:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Xu Song <[email protected]>

imkero · 2025-02-15T20:32:00Z

If len(logit_bias) is large, maybe we can keep the copy of logit_bias["index"] and logit_bias["value"] in the device memory ahead of time (or in the first sample step), and re-use it in the following sample steps, to avoid duplicated tensor copy?

Signed-off-by: Xu Song <[email protected]>

xu-song · 2025-02-17T02:38:51Z

If len(logit_bias) is large, maybe we can keep the copy of logit_bias["index"] and logit_bias["value"] in the device memory ahead of time (or in the first sample step), and re-use it in the following sample steps, to avoid duplicated tensor copy?

@imkero Thanks for your suggestion, a new commit has been added, which avoid duplicated tensor copy.

After this change, the time_cost is reduced to 0.01ms

len(logit_bias)	time cost (ms)
1	4.5 -> 0.01
20	4.5 -> 0.01
100	5.3 -> 0.01
1000	14.4 -> 0.01
10000	106 -> 0.01

Signed-off-by: Xu Song <[email protected]>

xu-song requested review from zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners February 15, 2025 13:26

mergify bot added the frontend label Feb 15, 2025

xu-song force-pushed the patch-4 branch from 40f5641 to 0492590 Compare February 15, 2025 13:28

xu-song added 2 commits February 15, 2025 22:43

optimize logit_bias

eb39b11

Signed-off-by: Xu Song <[email protected]>

optimize logit_bias

a50b0ea

Signed-off-by: Xu Song <[email protected]>

xu-song force-pushed the patch-4 branch from ab050fd to a50b0ea Compare February 15, 2025 14:44

xu-song changed the title ~~Optimize logit_bias_logits_processor~~ [Core] Optimize logit_bias_logits_processor Feb 15, 2025

avoid duplicated tensor copy

62a74e3

Signed-off-by: Xu Song <[email protected]>

xu-song changed the title ~~[Core] Optimize logit_bias_logits_processor~~ [Core] Faster logit_bias_logits_processor Feb 17, 2025

xu-song added 3 commits February 17, 2025 15:38

add lru_cache across different requests

0a2b889

Signed-off-by: Xu Song <[email protected]>

fix typing warning

fe6e1fe

Signed-off-by: Xu Song <[email protected]>

fix typing warning

cd9f33f

Signed-off-by: Xu Song <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Faster logit_bias_logits_processor #13334

[Core] Faster logit_bias_logits_processor #13334

xu-song commented Feb 15, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 15, 2025

imkero commented Feb 15, 2025 •

edited

Loading

xu-song commented Feb 17, 2025 •

edited

Loading

	logits.index_add_(0, logit_bias["index"].to(logits.device),
	logit_bias["value"].to(logits.device))

	if logit_bias["value"].device != logits.device:
	logit_bias["index"] = logit_bias["index"].to(logits.device)
	logit_bias["value"] = logit_bias["value"].to(logits.device)
	logits.index_add_(0, logit_bias["index"], logit_bias["value"])

	@lru_cache(maxsize=32)
	def _get_logit_bias_logits_processor(
	logit_bias_index: Union[Tuple[int], Tuple[str]],
	logit_bias_value: Tuple[float],
	vocab_size: int,

[Core] Faster logit_bias_logits_processor #13334

Are you sure you want to change the base?

[Core] Faster logit_bias_logits_processor #13334

Conversation

xu-song commented Feb 15, 2025 • edited by github-actions bot Loading

Before

After

Time Cost

impl history

github-actions bot commented Feb 15, 2025

imkero commented Feb 15, 2025 • edited Loading

xu-song commented Feb 17, 2025 • edited Loading

xu-song commented Feb 15, 2025 •

edited by github-actions bot

Loading

imkero commented Feb 15, 2025 •

edited

Loading

xu-song commented Feb 17, 2025 •

edited

Loading