-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support /v1/embeddings for image models #16
Comments
That doesn't exist, 404 is correct I've never considered adding it before, but it's an interesting use case. |
I came across this yesterday. While CLIP-API-service offers a variety of embedding models, I would much rather use the CLIP embeddings that miniCPM-V-2.6 uses or what Qwen2-VL uses. So I appreciate your looking into this when time permits pip install git+https://github.com/bentoml/CLIP-API-service.git (venvclip) anand@nitro17:~/clip-api-as-a-service$ clip-api-service list-models |
Interesting, I've always considered text-embeddings-inference to be the one stop shop for openai API embeddedings, but they don't seem to do images... would the API be the same? |
here is a project/candidate api that puts text and images on the same footing. one needs to look at relative scores compared to absolute scores https://github.com/bentoml/CLIP-API-service
|
@matatonic curl -X 'POST' |
Well... hrm. The format for the openai embeddings API is simple, it basically takes a string or an array of ints as input (tokens) and returns the array of floats that represent the embeddings. What I could do is pass an image url or an image data: uri as the input, and return the embeddings. any similarity or other processing would need to be client side. It could also process a batch (array of urls), and return a 2d array of embeddings. Otherwise, we're really looking at a whole new api... emb = client.embeddings.create(
model="...",
input=image_url,
encoding_format="float"
) This all said, I haven't looked into how to actually get the embeddings yet, and I don't want to just do a solution for a single backend (there are dozens). What you describe with the "picture of a dog" 'picture of a cat" - this would be a combined text embeddings and image embeddings - at this level the embeddings would need to be the text model embeddings after projection from the image embedding space. (I think!) Can you describe your use case? |
I totally endorse keeping the API simple and returning the vector embedding for a single image . The rest of the processing can be on the client side. Something better than a 404 response is a start My usecase is potentially comparing two similar images with their similarity closeness score if camera orientation has changed as part of a maintenance utility and only periodically annotate them as well as have the ability to optionally semantic search using the embedding vector rather than the entire image |
Whereas /v1/chat/completions succeeds , the same body /v1/embeddings returns a 404 for a similar body
I was hoping to get the embedding output vector for an image that uses the openbmb/MiniCPM-V-2_6 model
server-1 | INFO: 192.168.155.172:45930 - "POST /v1/chat/completions HTTP/1.1" 200 OK
server-1 | INFO: 192.168.155.172:57930 - "POST /v1/embeddings HTTP/1.1" 404 Not Found
I must be doing something incorrect. Any help @matatonic ?
Thanks in advance
The text was updated successfully, but these errors were encountered: