Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffmpeg compiler flag for video understanding #32

Merged
merged 2 commits into from
Oct 9, 2024

Conversation

HanClinto
Copy link

@HanClinto HanClinto commented Sep 17, 2024

PLEASE NOTE: This is a PR to amend ggml-org#9165 to implement ngxson's suggestion to make the ffmpeg dependency optional.

This moves the ffmpeg dependency to live behind a compiler flag that can be enabled optionally by setting LLAMA_FFMPEG=1 in call to make.

If you just build without this option, then attempting to load a video file displays a message to the user:

extract_frames: llama.cpp built without ffmpeg, processing video files is not supported.

However, one can re-enable video support by adding it as a compiler flag, ex:

make clean
make -j LLAMA_FFMPEG=1

And then video support works as-expected.

There are still a few rough points with this implementation that may need to be cleaned up -- namely:

  • There is are warnings about unused variables when building without ffmpeg:

examples/llava/minicpmv-cli.cpp:165:71: warning: unused parameter 'video_path' [-Wunused-parameter]
static std::vector<clip_image_u8 *> extract_frames(const std::string& video_path, const int frame_num) {
                                                                      ^
examples/llava/minicpmv-cli.cpp:165:93: warning: unused parameter 'frame_num' [-Wunused-parameter]
static std::vector<clip_image_u8 *> extract_frames(const std::string& video_path, const int frame_num) {
  • Generation does not stop if video loading is not supported, but it will simply attempt to continue without the video being there. If the prompt references the video, then the model will hallucinate -- this is not an ideal scenario. Example:
llama_kv_cache_init:      Metal KV buffer size =   224.00 MiB
llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
llama_new_context_with_model:      Metal compute buffer size =   303.22 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.01 MiB
llama_new_context_with_model: graph nodes  = 986
llama_new_context_with_model: graph splits = 2
extract_frames: llama.cpp built without ffmpeg, processing video files is not supported.
minicpmv_version: 3
<user>What is in the video?
<assistant>The video contains a sequence of images showing a car driving on a road. The car appears to be moving from left to right across the frame, and there are no other vehicles or people visible in the immediate surroundings. The background consists of a blurred landscape with trees and vegetation, suggesting that the car is traveling through a rural or suburban area.

The first image shows the front-left side of the car, while the subsequent images provide a more frontal view as the car continues to move forward. There are no significant changes in the environment or any additional objects introduced throughout the sequence. The overall impression is of a quiet and peaceful drive on a clear day.%
  • The way I implemented this adds the ffmpeg dependencies as a flag to EVERY binary built in the system, which may be good (as eventually video multimodality is supported by more and more models), but it may just be clutter for the time being. I have mixed feelings on this point.

Overall, I think this is still an improvement, since llama.cpp should not require a dependency on ffmpeg / pkg-config, and this nicely separates that out.

…nabled optionally by setting LLAMA_FFMPEG=1 in call to make.
@Galunid
Copy link

Galunid commented Sep 17, 2024

Could you add information about LLAMA_FFMPEG=1 flag directly to error message? I think it'd be helpful.

EDIT: I meant the flag name (not how to recompile)

@HanClinto
Copy link
Author

HanClinto commented Sep 17, 2024

Could you add information about LLAMA_FFMPEG=1 flag directly to error message? I think it'd be helpful.

EDIT: I meant the flag name (not how to recompile)

That's a good idea -- I'll add that tonight -- thanks!

EDIT: Done, thanks! :)

…PEG=1 to help users know exactly how to recompile with video support. Suggestion by @Galunid.
@Galunid
Copy link

Galunid commented Sep 18, 2024

Thanks!

@tc-mb tc-mb merged commit 4bd0c61 into OpenBMB:support-video-understanding Oct 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants