Support /v1/embeddings for image models #16

saket424 · 2024-09-05T03:27:42Z

Whereas /v1/chat/completions succeeds , the same body /v1/embeddings returns a 404 for a similar body

I was hoping to get the embedding output vector for an image that uses the openbmb/MiniCPM-V-2_6 model

server-1 | INFO: 192.168.155.172:45930 - "POST /v1/chat/completions HTTP/1.1" 200 OK
server-1 | INFO: 192.168.155.172:57930 - "POST /v1/embeddings HTTP/1.1" 404 Not Found

I must be doing something incorrect. Any help @matatonic ?

Thanks in advance

matatonic · 2024-09-05T09:20:33Z

That doesn't exist, 404 is correct

I've never considered adding it before, but it's an interesting use case.

saket424 · 2024-09-05T09:34:46Z

I came across this yesterday. While CLIP-API-service offers a variety of embedding models, I would much rather use the CLIP embeddings that miniCPM-V-2.6 uses or what Qwen2-VL uses. So I appreciate your looking into this when time permits

pip install git+https://github.com/bentoml/CLIP-API-service.git

(venvclip) anand@nitro17:~/clip-api-as-a-service$ clip-api-service list-models
['openai/clip-vit-base-patch32', 'openai/clip-vit-large-patch14-336', 'openai/clip-vit-base-patch16', 'openai/clip-vit-large-patch14', 'ViT-B-16:openai', 'ViT-L-14-336:openai', 'ViT-B-16-plus-240:laion400m_e32', 'ViT-g-14:laion2b_s34b_b88k', 'ViT-B-32:laion2b_s34b_b79k', 'ViT-bigG-14:laion2b_s39b_b160k', 'RN50:openai', 'ViT-B-32:laion400m_e31', 'ViT-B-32:laion2b_e16', 'RN101:openai', 'roberta-ViT-B-32:laion2b_s12b_b32k', 'ViT-L-14:laion2b_s32b_b82k', 'RN50x4:openai', 'RN50:cc12m', 'ViT-L-14:laion400m_e32', 'ViT-B-32:openai', 'ViT-B-16:laion400m_e31', 'ViT-B-16:laion400m_e32', 'ViT-B-32:laion400m_e32', 'ViT-H-14:laion2b_s32b_b79k', 'RN101:yfcc15m', 'ViT-L-14:openai', 'RN50x64:openai', 'ViT-B-16-plus-240:laion400m_e31', 'ViT-g-14:laion2b_s12b_b42k', 'RN50:yfcc15m', 'RN50x16:openai', 'xlm-roberta-base-ViT-B-32:laion5b_s13b_b90k', 'xlm-roberta-large-ViT-H-14:frozen_laion5b_s13b_b90k', 'ViT-L-14:laion400m_e31']

matatonic · 2024-09-05T15:51:53Z

Interesting, I've always considered text-embeddings-inference to be the one stop shop for openai API embeddedings, but they don't seem to do images... would the API be the same?

saket424 · 2024-09-05T16:51:21Z

here is a project/candidate api that puts text and images on the same footing. one needs to look at relative scores compared to absolute scores

https://github.com/bentoml/CLIP-API-service

curl -X 'POST' \
  'http://localhost:3000/rank' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "queries": [
    {
      "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg""
    }
  ],
  "candidates": [
    {
      "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg""
    },
    {
      "img_uri": "http://172.17.0.1:5000/api/events/1725496773.971715-t4e1qz/snapshot.jpg"
    },
   {
     "text": "picture of a dog"
   },
  {
     "text": "picture of a cat"
  },
]'

curl -X 'POST' \
  'http://localhost:3000/encode' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg"
  },
  {
    "text": "picture of a dog"
  },
  {
    "text": "picture of a cat"
  },
 
]'

saket424 · 2024-09-05T20:59:17Z

@matatonic
I don't know if this works for inputs other than text (i.e., images) . I was hoping it would if you decide to implement it

curl -X 'POST'
'http://localhost/v1/embeddings'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": "Hello world!"
}'

matatonic · 2024-09-11T00:20:47Z

Well... hrm. The format for the openai embeddings API is simple, it basically takes a string or an array of ints as input (tokens) and returns the array of floats that represent the embeddings.

What I could do is pass an image url or an image data: uri as the input, and return the embeddings. any similarity or other processing would need to be client side. It could also process a batch (array of urls), and return a 2d array of embeddings.

Otherwise, we're really looking at a whole new api...

emb = client.embeddings.create(
  model="...",
  input=image_url,
  encoding_format="float"
)

This all said, I haven't looked into how to actually get the embeddings yet, and I don't want to just do a solution for a single backend (there are dozens).

What you describe with the "picture of a dog" 'picture of a cat" - this would be a combined text embeddings and image embeddings - at this level the embeddings would need to be the text model embeddings after projection from the image embedding space. (I think!)

Can you describe your use case?

saket424 · 2024-09-11T17:27:53Z

I totally endorse keeping the API simple and returning the vector embedding for a single image . The rest of the processing can be on the client side. Something better than a 404 response is a start

My usecase is potentially comparing two similar images with their similarity closeness score if camera orientation has changed as part of a maintenance utility and only periodically annotate them as well as have the ability to optionally semantic search using the embedding vector rather than the entire image

matatonic changed the title ~~POST /v1/embeddings returns 404~~ Support /v1/embeddings for image models Sep 13, 2024

matatonic added the enhancement New feature or request label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support /v1/embeddings for image models #16

Support /v1/embeddings for image models #16

saket424 commented Sep 5, 2024

matatonic commented Sep 5, 2024 •

edited

Loading

saket424 commented Sep 5, 2024

matatonic commented Sep 5, 2024

saket424 commented Sep 5, 2024 •

edited

Loading

saket424 commented Sep 5, 2024

matatonic commented Sep 11, 2024

saket424 commented Sep 11, 2024

Support /v1/embeddings for image models #16

Support /v1/embeddings for image models #16

Comments

saket424 commented Sep 5, 2024

matatonic commented Sep 5, 2024 • edited Loading

saket424 commented Sep 5, 2024

matatonic commented Sep 5, 2024

saket424 commented Sep 5, 2024 • edited Loading

saket424 commented Sep 5, 2024

matatonic commented Sep 11, 2024

saket424 commented Sep 11, 2024

matatonic commented Sep 5, 2024 •

edited

Loading

saket424 commented Sep 5, 2024 •

edited

Loading