Add cpu offload for Qwen/Qwen2-VL-72B-Instruct-AWQ #24

nguyen-brat · 2024-10-17T11:24:49Z

Hello, I want to run Qwen/Qwen2-VL-72B-Instruct-AWQ on my local computer, currently, I have 2xrtx 3090 but it has trouble OOM. Then I see in your vision.py 's options there --max-memory option to offload on CPU. Can you please implement it also for Qwen/Qwen2-VL-72B-Instruct-AWQ

The text was updated successfully, but these errors were encountered:

matatonic · 2024-10-17T19:09:51Z

I'd like to do this, and for other models also, but it's awkward and vastly complicates the loading as far as I know it. I'd like to enable this and have had it back of mind for a while.

matatonic added the enhancement New feature or request label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cpu offload for Qwen/Qwen2-VL-72B-Instruct-AWQ #24

Add cpu offload for Qwen/Qwen2-VL-72B-Instruct-AWQ #24

nguyen-brat commented Oct 17, 2024

matatonic commented Oct 17, 2024

Add cpu offload for Qwen/Qwen2-VL-72B-Instruct-AWQ #24

Add cpu offload for Qwen/Qwen2-VL-72B-Instruct-AWQ #24

Comments

nguyen-brat commented Oct 17, 2024

matatonic commented Oct 17, 2024