Model seeing a black image even when there's no image attached. (Molmo 7B D) #28

fixu124 · 2024-12-07T15:16:01Z

For a while now I've been trying to get this project to work locally on Windows, and after a lot of work, I decided to just run it using Ubuntu-WSL. It works fine, but with some inconveniences, as shown in the image. I just want to chat with only text initially, then afterwards add images.
Also, the inference is REALLY slow (4 T/s), and I don't really think it is because of my hardware (Maybe it is something about context length, but I'm not sure).

I would appreciate some help on those issues.

Environment:
OS: Ubuntu-WSL (I already tried running on Docker/Windows and faced problems)
Model: Molmo 7B D
Quantization: BNB 4bit

Hardware:
GPU: 3060 12GB
RAM: 16GB

matatonic · 2024-12-07T15:32:40Z

The model's training is image focused unfortunately, and without one will just error, so I provide a black pixel. It's a very specialized fine tune and doesn't handle non-image chat well.

4 T/s does seem slow. I see about 8GB VRAM usage when I run it that way, and get about 25 T/s. Could it be loading in CPU? I don't know windows/wsl at all really, so I may not be able to help you much, but nvidia-smi should show you GPU vram usage.

matatonic · 2024-12-07T15:43:58Z

Just an idea, with open-webui you can switch the model part way though the conversation, so you could start with no image on one model, and later switch models to the image model when you add the image.

fixu124 · 2024-12-07T15:50:29Z

The model's training is image focused unfortunately, and without one will just error, so I provide a black pixel. It's a very specialized fine tune and doesn't handle non-image chat well.

4 T/s does seem slow. I see about 8GB VRAM usage when I run it that way, and get about 25 T/s. Could it be loading in CPU? I don't know windows/wsl at all really, so I may not be able to help you much, but nvidia-smi should show you GPU vram usage.

Hmm, got it. But Ubuntu-WSL is using ALL of the gpu's 12GB and more 4GB of my RAM, I REALLY think it is something about context length. Any way I can change the max context length mannually?

fixu124 · 2024-12-07T15:51:00Z

Just an idea, with open-webui you can switch the model part way though the conversation, so you could start with no image on one model, and later switch models to the image model when you add the image.

Oh thank you! That's a clever workaround :)

matatonic · 2024-12-07T15:58:49Z

The model's training is image focused unfortunately, and without one will just error, so I provide a black pixel. It's a very specialized fine tune and doesn't handle non-image chat well.
4 T/s does seem slow. I see about 8GB VRAM usage when I run it that way, and get about 25 T/s. Could it be loading in CPU? I don't know windows/wsl at all really, so I may not be able to help you much, but nvidia-smi should show you GPU vram usage.

Hmm, got it. But Ubuntu-WSL is using ALL of the gpu's 12GB and more 4GB of my RAM, I REALLY think it is something about context length. Any way I can change the max context length mannually?

I don't have an option for this yet, but it's a good idea, I really should.

Are you sure you're loading with --load-in-4bit? without 4bit I would expect about 17GB of usage.

#CLI_COMMAND="python vision.py -m allenai/Molmo-7B-D-0924 -A flash_attention_2 --load-in-4bit --use-double-quant"  # test pass✅, time: 45.2s, mem: 7.7GB, 13/13 tests passed, (318/14.5s) 21.9 T/s
#CLI_COMMAND="python vision.py -m allenai/Molmo-7B-D-0924 -A flash_attention_2 --load-in-4bit"  # test pass✅, time: 38.6s, mem: 8.1GB, 13/13 tests passed, (310/12.3s) 25.2 T/s

fixu124 · 2024-12-07T16:08:19Z

The model's training is image focused unfortunately, and without one will just error, so I provide a black pixel. It's a very specialized fine tune and doesn't handle non-image chat well.
4 T/s does seem slow. I see about 8GB VRAM usage when I run it that way, and get about 25 T/s. Could it be loading in CPU? I don't know windows/wsl at all really, so I may not be able to help you much, but nvidia-smi should show you GPU vram usage.

Hmm, got it. But Ubuntu-WSL is using ALL of the gpu's 12GB and more 4GB of my RAM, I REALLY think it is something about context length. Any way I can change the max context length mannually?

I don't have an option for this yet, but it's a good idea, I really should.

Are you sure you're loading with --load-in-4bit? without 4bit I would expect about 17GB of usage.
#CLI_COMMAND="python vision.py -m allenai/Molmo-7B-D-0924 -A flash_attention_2 --load-in-4bit --use-double-quant"  # test pass✅, time: 45.2s, mem: 7.7GB, 13/13 tests passed, (318/14.5s) 21.9 T/s
#CLI_COMMAND="python vision.py -m allenai/Molmo-7B-D-0924 -A flash_attention_2 --load-in-4bit"  # test pass✅, time: 38.6s, mem: 8.1GB, 13/13 tests passed, (310/12.3s) 25.2 T/s

I'm using this exact command: "python vision.py -m "/home/homemdesgraca/openedai-vision/models/molmo-7B-D-bnb-4bit" -A flash_attention_2 --load-in-4bit" and it is still using my full 12GB VRAM + 4GB RAM.

matatonic · 2024-12-07T16:21:29Z

Oh, molmo-7B-D-bnb-4bit, I couldn't get that to load properly so I left it out of the supported list, try again without the --load-in-4bit maybe? but As I said, I couldn't get it to load properly.

fixu124 · 2024-12-07T16:43:31Z

Oh, molmo-7B-D-bnb-4bit, I couldn't get that to load properly so I left it out of the supported list, try again without the --load-in-4bit maybe? but As I said, I couldn't get it to load properly.

it worked, i just had to use the default model, without being already quantized! i got 15 T/s now, a lot more usable. thank you!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model seeing a black image even when there's no image attached. (Molmo 7B D) #28

Model seeing a black image even when there's no image attached. (Molmo 7B D) #28

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024 •

edited

Loading

Model seeing a black image even when there's no image attached. (Molmo 7B D) #28

Model seeing a black image even when there's no image attached. (Molmo 7B D) #28

Comments

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024

matatonic commented Dec 7, 2024

fixu124 commented Dec 7, 2024 • edited Loading

fixu124 commented Dec 7, 2024 •

edited

Loading