Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(multimodal): Video understanding #2318

Closed
mudler opened this issue May 13, 2024 · 0 comments · Fixed by #3729
Closed

feat(multimodal): Video understanding #2318

mudler opened this issue May 13, 2024 · 0 comments · Fixed by #3729
Labels
enhancement New feature or request roadmap up for grabs Tickets that no-one is currently working on

Comments

@mudler
Copy link
Owner

mudler commented May 13, 2024

It should be possible now to expand the vision support to understand videos, there are projects like
https://github.com/Efficient-Large-Model/VILA
https://github.com/LLaVA-VL/LLaVA-NeXT
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct?s=09

which make this possible nowadays. Since OpenAI has announced GPT4o, makes sense start looking into open solutions that we can plug into the API with specific backends.

llama.cpp: ggerganov/llama.cpp#9165
vLLM: #3670

@mudler mudler added the enhancement New feature or request label May 13, 2024
@mudler mudler added roadmap up for grabs Tickets that no-one is currently working on labels May 13, 2024
mudler added a commit that referenced this issue Oct 4, 2024
Closes: #2318

Signed-off-by: Ettore Di Giacinto <[email protected]>
mudler added a commit that referenced this issue Oct 4, 2024
* feat(vllm): add support for image-to-text

Related to #3670

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add support for video-to-text

Closes: #2318

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): support CPU installations

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add bnb

Signed-off-by: Ettore Di Giacinto <[email protected]>

* chore: add docs reference

Signed-off-by: Ettore Di Giacinto <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <[email protected]>

---------

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
siddimore pushed a commit to siddimore/LocalAI that referenced this issue Oct 6, 2024
)

* feat(vllm): add support for image-to-text

Related to mudler#3670

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add support for video-to-text

Closes: mudler#2318

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): support CPU installations

Signed-off-by: Ettore Di Giacinto <[email protected]>

* feat(vllm): add bnb

Signed-off-by: Ettore Di Giacinto <[email protected]>

* chore: add docs reference

Signed-off-by: Ettore Di Giacinto <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <[email protected]>

---------

Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap up for grabs Tickets that no-one is currently working on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant