Delayed sampling #720

mfylcek · 2025-01-22T09:31:59Z

No description provided.

remove expert_max hard code (#47) vLLM-Ext: Full enabling of ALiBi (#34) Add version inference via setuptools-scm (#58) Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59) Remove punica_hpu.py from vllm_hpu_extension (#66) Removed previous (not-pipelined) pa implementation (#72) Add flag to enable running softmax in fp32 (#71) Update calibration readme link (#73) allow lm_head quantization in calibration process (#65) Pad to bmin if value is less (#67) Update pyproject.toml (#75) --------- Co-authored-by: Michał Kuligowski <[email protected]>

mfylcek and others added 13 commits January 16, 2025 11:54

Storing previous logits and reading them in decode

5bd93d2

First accuracy

ffcf007

Flag checking and MSS

6c4fd0b

Flag bug fix

8b98f21

checking num_lookahead_slots

fc0d6f5

Config fix

f16fa53

Config fix

8d3602f

Config fix for MSS

5ed2c11

Set Triton version

85ac250

Merge branch 'habana_main' into dev/mfylcek/async_prompt_sampling

c3d2d2d

Bug fixes

fdb3b9f

Typo

74c87c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed sampling #720

Delayed sampling #720

mfylcek commented Jan 22, 2025

Delayed sampling #720

Are you sure you want to change the base?

Delayed sampling #720

Conversation

mfylcek commented Jan 22, 2025