Releases · matatonic/openedai-vision

06 Mar 00:02

matatonic

0.42.0

13a9c8d

0.42.0 Latest

Latest

Version 0.42.0

new model support: CohereForAI/aya-vision family of models
new model support: AIDC-AI/Ovis2 family of models
new model support: Qwen/Qwen2.5-VL family of models
new model support: Qwen/QVQ-72B-Preview
new model support: HuggingFaceM4/Idefics3-8B-Llama3
compatibility: better backend auto detection for more flexible support of models by type
bump torch to 2.5
restrict requests to one at a time (no batching yet)
REGRESSION: memory usage randomly seems to blow up with some models (qwen2/qwen2.5), this seems to be a new Qwen specific bug
REGRESSION: GTPT-Int4/8 probably broken again
⚠️ DEPRECATED MODELS (use the 0.41.0 docker image for support of these models): TIGER-Lab/Mantis, Ovis1.6-Gemma2-9B, Ovis1.6-Gemma2-27B, Ovis1.5-Gemma2-9B, allenai/Molmo, BAAI/Bunny, BAAI/Emu3-Chat, echo840/Monkey-Chat, failspy/Phi-3-vision-128k-instruct-abliterated-alpha, google/paligemma2, microsoft/Florence-2-large-ft, microsoft/Phi-3-vision, microsoft/Phi-3.5-vision, qnguyen3/nanoLLaVA, rhymes-ai/Aria

Assets 2

07 Dec 21:43

matatonic

0.41.0

64a4e0e

0.41.0

Version 0.41.0

new model support: OpenGVLab's InternVL 2.5 family of models (1B-78B)
I tried many ways to get split_model() working with InternVL but failed repeatedly, sorry!

Assets 2

07 Dec 04:06

matatonic

0.40.0

3cc9051

0.40.0

Version 0.40.0

new model support: AIDC-AI/Ovis1.6-Llama3.2-3B, AIDC-AI/Ovis1.6-Gemma2-27B
new model support: BAAI/Aquila-VL-2B-llava-qwen
new model support: HuggingFaceTB/SmolVLM-Instruct
new model support: google/paligemma2 family of models (very limited instruct/chat training so far)
Qwen2-VL: unpin Qwen2-VL-7B & remove Qwen hacks, GTPT-Int4/8 working again (still slow - why?)
pin bitsandbytes==0.44.1
⚠️ DEPRECATED MODELS (use the 0.39.2 docker image for support of these models): internlm-xcomposer2-7b, internlm-xcomposer2-7b-4bit, internlm-xcomposer2-vl-1_8b, internlm-xcomposer2-vl-7b, internlm-xcomposer2-vl-7b-4bit, nvidia/NVLM-D-72B, Llama-3-8B-Dragonfly-Med-v1, Llama-3-8B-Dragonfly-v1

Assets 2

13 Oct 13:28

matatonic

0.39.2

6e7327d

0.39.2

Version 0.39.2

performance: use float16 with Qwen2 AWQ, small performance improvement
fix: handle Ubuntu 24 / Python 3.12 a little better, thanks @Lissanro
old code in the last docker, github worker problems again?

Assets 2

10 Oct 21:26

matatonic

0.39.1

97c8882

0.39.1

Version 0.39.1

fix: the github docker package build seems to have been broken a while

Assets 2

10 Oct 20:48

matatonic

0.39.0

c80e1f2

0.39.0

Version 0.39.0

new model support: rhymes-ai/Aria
improved support for multi-image in various models.
docker package: The latest release will now be tagged with :latest, rather than latest commit.
⚠️ docker: docker will now run as a user instead of root. Your hf_home volume may need the ownership fixed, you can use this command: sudo chown $(id -u):$(id -g) -R hf_home

Assets 2

08 Oct 23:49

matatonic

0.38.0

9f4dc20

0.38.0

Recent updates

Version 0.38.0

new model support: AIDC-AI/Ovis1.6-Gemma2-9B

Version 0.37.0 (missing release build)

new model support: nvidia/NVLM-D-72B

Assets 2

01 Oct 03:29

matatonic

0.36.0

b607e8e

0.36.0

Recent updates

Version 0.36.0

new model support: BAAI/Emu3-Chat
Experimental support for fancyfeast/joy-caption-alpha-two with multiple images

Assets 2

29 Sep 20:52

matatonic

0.35.0

ec48edc

0.35.0

Recent updates

Version 0.35.0

Update Molmo (tensorflow-cpu no longer required), and add autocast for faster, smaller types than float32.
New option: --use-double-quant to enable double quantization with --load-in-4bit, a little slower for a little less VRAM.
Molmo 72B will now run in under 48GB of vram using --load-in-4bit --use-double-quant.
Add completion_tokens counts in API and logged tokens/s for most results, other compatibility improvements
Include sample tokens/s data (A100) in vision.sample.env

Assets 2

26 Sep 01:14

matatonic

0.34.0

6c45b53

0.34.0

Recent updates

Version 0.34.0

new model support: Meta-llama: Llama-3.2-11B-Vision-Instruct, Llama-3.2-90B-Vision-Instruct
new model support: Ai2/allenai Molmo family of models (requires additional pip install tensorflow-cpu for now, see note)
new model support: stepfun-ai/GOT-OCR2_0, this is an OCR only model, all chat is ignored.
Support moved to alt image: Bunny-Llama-3-8B-V, Bunny-v1_1-Llama-3-8B-V, Mantis-8B-clip-llama3, Mantis-8B-siglip-llama3, omchat-v2.0-13B-single-beta_hf, qihoo360/360VL-8B

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recent updates

Recent updates

Recent updates

Recent updates

Releases: matatonic/openedai-vision

0.42.0

0.41.0

0.40.0

0.39.2

0.39.1

0.39.0

0.38.0

Recent updates

0.36.0

Recent updates

0.35.0

Recent updates

0.34.0

Recent updates