Releases: matatonic/openedai-vision
Releases · matatonic/openedai-vision
0.42.0
Version 0.42.0
- new model support: CohereForAI/aya-vision family of models
- new model support: AIDC-AI/Ovis2 family of models
- new model support: Qwen/Qwen2.5-VL family of models
- new model support: Qwen/QVQ-72B-Preview
- new model support: HuggingFaceM4/Idefics3-8B-Llama3
- compatibility: better backend auto detection for more flexible support of models by type
- bump torch to 2.5
- restrict requests to one at a time (no batching yet)
- REGRESSION: memory usage randomly seems to blow up with some models (qwen2/qwen2.5), this seems to be a new Qwen specific bug
- REGRESSION: GTPT-Int4/8 probably broken again
⚠️ DEPRECATED MODELS (use the0.41.0
docker image for support of these models): TIGER-Lab/Mantis, Ovis1.6-Gemma2-9B, Ovis1.6-Gemma2-27B, Ovis1.5-Gemma2-9B, allenai/Molmo, BAAI/Bunny, BAAI/Emu3-Chat, echo840/Monkey-Chat, failspy/Phi-3-vision-128k-instruct-abliterated-alpha, google/paligemma2, microsoft/Florence-2-large-ft, microsoft/Phi-3-vision, microsoft/Phi-3.5-vision, qnguyen3/nanoLLaVA, rhymes-ai/Aria
0.41.0
0.40.0
Version 0.40.0
- new model support: AIDC-AI/Ovis1.6-Llama3.2-3B, AIDC-AI/Ovis1.6-Gemma2-27B
- new model support: BAAI/Aquila-VL-2B-llava-qwen
- new model support: HuggingFaceTB/SmolVLM-Instruct
- new model support: google/paligemma2 family of models (very limited instruct/chat training so far)
- Qwen2-VL: unpin Qwen2-VL-7B & remove Qwen hacks, GTPT-Int4/8 working again (still slow - why?)
- pin bitsandbytes==0.44.1
⚠️ DEPRECATED MODELS (use the0.39.2
docker image for support of these models): internlm-xcomposer2-7b, internlm-xcomposer2-7b-4bit, internlm-xcomposer2-vl-1_8b, internlm-xcomposer2-vl-7b, internlm-xcomposer2-vl-7b-4bit, nvidia/NVLM-D-72B, Llama-3-8B-Dragonfly-Med-v1, Llama-3-8B-Dragonfly-v1
0.39.2
0.39.1
0.39.0
Version 0.39.0
- new model support: rhymes-ai/Aria
- improved support for multi-image in various models.
- docker package: The latest release will now be tagged with
:latest
, rather than latest commit. ⚠️ docker: docker will now run as a user instead of root. Yourhf_home
volume may need the ownership fixed, you can use this command:sudo chown $(id -u):$(id -g) -R hf_home
0.38.0
0.36.0
0.35.0
Recent updates
Version 0.35.0
- Update Molmo (tensorflow-cpu no longer required), and add autocast for faster, smaller types than float32.
- New option:
--use-double-quant
to enable double quantization with--load-in-4bit
, a little slower for a little less VRAM. - Molmo 72B will now run in under 48GB of vram using
--load-in-4bit --use-double-quant
. - Add
completion_tokens
counts in API and logged tokens/s for most results, other compatibility improvements - Include sample tokens/s data (A100) in
vision.sample.env
0.34.0
Recent updates
Version 0.34.0
- new model support: Meta-llama: Llama-3.2-11B-Vision-Instruct, Llama-3.2-90B-Vision-Instruct
- new model support: Ai2/allenai Molmo family of models (requires additional
pip install tensorflow-cpu
for now, see note) - new model support: stepfun-ai/GOT-OCR2_0, this is an OCR only model, all chat is ignored.
- Support moved to alt image: Bunny-Llama-3-8B-V, Bunny-v1_1-Llama-3-8B-V, Mantis-8B-clip-llama3, Mantis-8B-siglip-llama3, omchat-v2.0-13B-single-beta_hf, qihoo360/360VL-8B