Geon dev #3

daniel-geon-park · 2024-04-01T06:00:38Z

No description provided.

…ntion (#2768)

Co-authored-by: Chunan Zeng <[email protected]>

This reverts commit 5c976a7.

…854) Co-authored-by: Roy <[email protected]>

* add mixtral lora support * formatting * fix incorrectly ported logic * polish tests * minor fixes and refactoring * minor fixes * formatting * rename and remove redundant logic * refactoring * refactoring * minor fix * minor refactoring * fix code smell

Co-authored-by: Roy <[email protected]>

…863)

* Fix AttributeError: MixtralModel object has no attribute org_vocab_size. * Make LoRA logic for Mistral and Mixtral the same --------- Co-authored-by: Pernekhan Utemuratov <[email protected]>

@Yard1

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project/vllm#2514 (comment)

how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models

…ner (#2905)

…) when using Multi-LoRA. (#3350)

…(#3383)

… tune moe kernel in A100/H100 with tp=2,4,8 (#3389)

Co-authored-by: Daniel Clark <[email protected]>

Co-authored-by: Cade Daniel <[email protected]>

…nel (#3376)

Co-authored-by: Simon Mo <[email protected]>

This reverts commit 787ce5d.

rkooo567 and others added 30 commits February 8, 2024 09:57

[Ray] Integration compiled DAG off by default (#2471)

65b89d1

Disable custom all reduce by default (#2808)

3711811

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…

0580aab

…ntion (#2768)

Add documentation section about LoRA (#2834)

4ca2c35

Refactor 2 awq gemm kernels into m16nXk32 (#2723)

5638364

Co-authored-by: Chunan Zeng <[email protected]>

Serving Benchmark Refactoring (#2433)

a4211a4

[CI] Ensure documentation build is checked in CI (#2842)

f964493

Refactor llama family models (#2637)

5c976a7

Revert "Refactor llama family models (#2637)" (#2851)

ea35600

This reverts commit 5c976a7.

Use CuPy for CUDA graphs (#2811)

a463c33

Remove Yi model definition, please use LlamaForCausalLM instead (#2…

317b29d

…854) Co-authored-by: Roy <[email protected]>

Migrate InternLMForCausalLM to LlamaForCausalLM (#2860)

7eacffd

Co-authored-by: Roy <[email protected]>

Fix internlm after vllm-project/vllm#2860 (#2861)

0c48b37

[Fix] Fix memory profiling when GPU is used by multiple processes (#2…

7e45107

…863)

Fix docker python version (#2845)

87069cc

Migrate AquilaForCausalLM to LlamaForCausalLM (#2867)

4efbac6

Don't use cupy NCCL for AMD backends (#2855)

25e86b6

Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)

31348df

* Fix AttributeError: MixtralModel object has no attribute org_vocab_size. * Make LoRA logic for Mistral and Mixtral the same --------- Co-authored-by: Pernekhan Utemuratov <[email protected]>

[BugFix] Fix GC bug for LLM class (#2882)

d7afab6

Fix DeciLM (#2883)

4f2ad11

[ROCm] Dockerfile fix for flash-attention build (#2885)

5255d99

Prefix Caching- fix t4 triton error (#2517)

64da65b

Bump up to v0.3.1 (#2887)

5f08050

Defensively copy sampling_params (#2881)

185b2c2

If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 vllm-project/vllm#2514 (comment)

Add code-revision config argument for Hugging Face Hub (#2892)

786b7f1

[Minor] Small fix to make distributed init logic in worker looks clea…

537c975

…ner (#2905)

[Test] Add basic correctness test (#2908)

a61f052

Support OLMo models. (#2832)

ab3a5a8

orsharir and others added 29 commits March 13, 2024 12:18

Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism…

ae0ccb4

…) when using Multi-LoRA. (#3350)

Add batched RoPE kernel (#3095)

7e9bd08

Fix lint (#3388)

c33afd8

[FIX] Simpler fix for async engine running on ray (#3371)

eeab52a

[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion …

81653d9

…(#3383)

allow user to chose which vllm's merics to display in grafana (#3393)

a37415c

[Kernel] change benchmark script so that result can be directly used;…

8fe8386

… tune moe kernel in A100/H100 with tp=2,4,8 (#3389)

help me

e4b3179

Install flash_attn in Docker image (#3396)

06ec486

Add args for mTLS support (#3410)

c17ca8e

Co-authored-by: Daniel Clark <[email protected]>

[issue templates] add some issue templates (#3412)

dfc7740

Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373)

54be8a0

Co-authored-by: Cade Daniel <[email protected]>

fix marlin config repr (#3414)

b983ba3

Dynamically configure shared memory size for moe_align_block_size_ker…

78b6c48

…nel (#3376)

[Misc] add HOST_IP env var (#3419)

b522c44

Co-authored-by: Simon Mo <[email protected]>

merge for gemma

d4204f2

fix bug for self extedn

7b8f4d2

fix prompt fetch

924c631

update

787ce5d

Revert "update"

8a3b02b

This reverts commit 787ce5d.

update prefetch

729bd0c

minor

e719ae5

masking reuse

e1464de

add avg reporter

ba9f3e9

fix for multi batch

2336cf7

handle single batch

c2cfbaa

update

586eda1

update model runner

dcf01ef

Merge remote-tracking branch 'origin/ainl-dev' into geon-dev

1081a07

daniel-geon-park merged commit 52ed876 into main Apr 1, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geon dev #3

Geon dev #3

daniel-geon-park commented Apr 1, 2024

Geon dev #3

Geon dev #3

Conversation

daniel-geon-park commented Apr 1, 2024