This custom node for ComfyUI allows you to use the Doubutsu small VLM model to describe images. Credit and further information on Doubutsu: https://huggingface.co/qresearch/doubutsu-2b-pt-756
- Clone this repository into your ComfyUI's
custom_nodes
directory: git clone https://github.com/EnragedAntelope/comfyui-doubutsu-describer.git - Install the required dependencies: pip install -r requirements.txt
- Download the model files:
- Create a
models
directory in the root of this repository (ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer). - Download the model files for "qresearch/doubutsu-2b-pt-756" from Hugging Face and place them in
models/qresearch/doubutsu-2b-pt-756/
. - Download the adapter files for "qresearch/doubutsu-2b-lora-756-docci" and place them in
models/qresearch/doubutsu-2b-lora-756-docci/
.
You can download these files manually from the Hugging Face website or use the Hugging Face CLI:
Open a command prompt, navigate to your ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer directory, then execute:
huggingface-cli download qresearch/doubutsu-2b-pt-756 --local-dir models/qresearch/doubutsu-2b-pt-756
huggingface-cli download qresearch/doubutsu-2b-lora-756-docci --local-dir models/qresearch/doubutsu-2b-lora-756-docci
- Restart ComfyUI
After installation, you'll find a new node called "Doubutsu Image Describer" in the "image/text" category. Connect an image to its input, and it will generate a description based on the provided question.
image
: The input image to describequestion
: The question to ask about the image (default: "Describe the image")max_new_tokens
: Maximum number of tokens to generate (default: 128)temperature
: Controls randomness in generation (default: 0.1)precision
: Choose between float16 or bfloat16 for inference. If your GPU supports it, bfloat16 should be quicker.
[Apache 2.0]