Skip to content

the-ride-never-ends/ComfyUI-Florence2-DocVQA

 
 

Repository files navigation

Florence2 in ComfyUI

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

Installation:

  • clone this repository to 'ComfyUI/custom_nodes` -folder. Only real dependency is new enough transformers version.

image image

Supports the following models, they are automatically downloaded to ComfyUI/LLM:

https://huggingface.co/microsoft/Florence-2-base

https://huggingface.co/microsoft/Florence-2-base-ft

https://huggingface.co/microsoft/Florence-2-large

https://huggingface.co/microsoft/Florence-2-large-ft

About

DocVQA inferenece

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%