Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support /v1/embeddings for image models #16

Open
saket424 opened this issue Sep 5, 2024 · 7 comments
Open

Support /v1/embeddings for image models #16

saket424 opened this issue Sep 5, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@saket424
Copy link

saket424 commented Sep 5, 2024

Whereas /v1/chat/completions succeeds , the same body /v1/embeddings returns a 404 for a similar body

I was hoping to get the embedding output vector for an image that uses the openbmb/MiniCPM-V-2_6 model

server-1 | INFO: 192.168.155.172:45930 - "POST /v1/chat/completions HTTP/1.1" 200 OK
server-1 | INFO: 192.168.155.172:57930 - "POST /v1/embeddings HTTP/1.1" 404 Not Found

I must be doing something incorrect. Any help @matatonic ?

Thanks in advance

@matatonic
Copy link
Owner

matatonic commented Sep 5, 2024

That doesn't exist, 404 is correct

I've never considered adding it before, but it's an interesting use case.

@saket424
Copy link
Author

saket424 commented Sep 5, 2024

I came across this yesterday. While CLIP-API-service offers a variety of embedding models, I would much rather use the CLIP embeddings that miniCPM-V-2.6 uses or what Qwen2-VL uses. So I appreciate your looking into this when time permits

pip install git+https://github.com/bentoml/CLIP-API-service.git

(venvclip) anand@nitro17:~/clip-api-as-a-service$ clip-api-service list-models
['openai/clip-vit-base-patch32', 'openai/clip-vit-large-patch14-336', 'openai/clip-vit-base-patch16', 'openai/clip-vit-large-patch14', 'ViT-B-16:openai', 'ViT-L-14-336:openai', 'ViT-B-16-plus-240:laion400m_e32', 'ViT-g-14:laion2b_s34b_b88k', 'ViT-B-32:laion2b_s34b_b79k', 'ViT-bigG-14:laion2b_s39b_b160k', 'RN50:openai', 'ViT-B-32:laion400m_e31', 'ViT-B-32:laion2b_e16', 'RN101:openai', 'roberta-ViT-B-32:laion2b_s12b_b32k', 'ViT-L-14:laion2b_s32b_b82k', 'RN50x4:openai', 'RN50:cc12m', 'ViT-L-14:laion400m_e32', 'ViT-B-32:openai', 'ViT-B-16:laion400m_e31', 'ViT-B-16:laion400m_e32', 'ViT-B-32:laion400m_e32', 'ViT-H-14:laion2b_s32b_b79k', 'RN101:yfcc15m', 'ViT-L-14:openai', 'RN50x64:openai', 'ViT-B-16-plus-240:laion400m_e31', 'ViT-g-14:laion2b_s12b_b42k', 'RN50:yfcc15m', 'RN50x16:openai', 'xlm-roberta-base-ViT-B-32:laion5b_s13b_b90k', 'xlm-roberta-large-ViT-H-14:frozen_laion5b_s13b_b90k', 'ViT-L-14:laion400m_e31']

@matatonic
Copy link
Owner

Interesting, I've always considered text-embeddings-inference to be the one stop shop for openai API embeddedings, but they don't seem to do images... would the API be the same?

@saket424
Copy link
Author

saket424 commented Sep 5, 2024

here is a project/candidate api that puts text and images on the same footing. one needs to look at relative scores compared to absolute scores

https://github.com/bentoml/CLIP-API-service

curl -X 'POST' \
  'http://localhost:3000/rank' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "queries": [
    {
      "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg""
    }
  ],
  "candidates": [
    {
      "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg""
    },
    {
      "img_uri": "http://172.17.0.1:5000/api/events/1725496773.971715-t4e1qz/snapshot.jpg"
    },
   {
     "text": "picture of a dog"
   },
  {
     "text": "picture of a cat"
  },
]'
curl -X 'POST' \
  'http://localhost:3000/encode' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg"
  },
  {
    "text": "picture of a dog"
  },
  {
    "text": "picture of a cat"
  },
 
]'

@saket424
Copy link
Author

saket424 commented Sep 5, 2024

@matatonic
I don't know if this works for inputs other than text (i.e., images) . I was hoping it would if you decide to implement it

curl -X 'POST'
'http://localhost/v1/embeddings'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": "Hello world!"
}'

@matatonic
Copy link
Owner

Well... hrm. The format for the openai embeddings API is simple, it basically takes a string or an array of ints as input (tokens) and returns the array of floats that represent the embeddings.

What I could do is pass an image url or an image data: uri as the input, and return the embeddings. any similarity or other processing would need to be client side. It could also process a batch (array of urls), and return a 2d array of embeddings.

Otherwise, we're really looking at a whole new api...

emb = client.embeddings.create(
  model="...",
  input=image_url,
  encoding_format="float"
)

This all said, I haven't looked into how to actually get the embeddings yet, and I don't want to just do a solution for a single backend (there are dozens).

What you describe with the "picture of a dog" 'picture of a cat" - this would be a combined text embeddings and image embeddings - at this level the embeddings would need to be the text model embeddings after projection from the image embedding space. (I think!)

Can you describe your use case?

@saket424
Copy link
Author

I totally endorse keeping the API simple and returning the vector embedding for a single image . The rest of the processing can be on the client side. Something better than a 404 response is a start

My usecase is potentially comparing two similar images with their similarity closeness score if camera orientation has changed as part of a maintenance utility and only periodically annotate them as well as have the ability to optionally semantic search using the embedding vector rather than the entire image

@matatonic matatonic changed the title POST /v1/embeddings returns 404 Support /v1/embeddings for image models Sep 13, 2024
@matatonic matatonic added the enhancement New feature or request label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants