ffmpeg compiler flag for video understanding #32

HanClinto · 2024-09-17T05:26:06Z

PLEASE NOTE: This is a PR to amend ggml-org#9165 to implement ngxson's suggestion to make the ffmpeg dependency optional.

This moves the ffmpeg dependency to live behind a compiler flag that can be enabled optionally by setting LLAMA_FFMPEG=1 in call to make.

If you just build without this option, then attempting to load a video file displays a message to the user:

extract_frames: llama.cpp built without ffmpeg, processing video files is not supported.

However, one can re-enable video support by adding it as a compiler flag, ex:

make clean
make -j LLAMA_FFMPEG=1

And then video support works as-expected.

There are still a few rough points with this implementation that may need to be cleaned up -- namely:

There is are warnings about unused variables when building without ffmpeg:


examples/llava/minicpmv-cli.cpp:165:71: warning: unused parameter 'video_path' [-Wunused-parameter]
static std::vector<clip_image_u8 *> extract_frames(const std::string& video_path, const int frame_num) {
                                                                      ^
examples/llava/minicpmv-cli.cpp:165:93: warning: unused parameter 'frame_num' [-Wunused-parameter]
static std::vector<clip_image_u8 *> extract_frames(const std::string& video_path, const int frame_num) {

Generation does not stop if video loading is not supported, but it will simply attempt to continue without the video being there. If the prompt references the video, then the model will hallucinate -- this is not an ideal scenario. Example:

llama_kv_cache_init:      Metal KV buffer size =   224.00 MiB
llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
llama_new_context_with_model:      Metal compute buffer size =   303.22 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.01 MiB
llama_new_context_with_model: graph nodes  = 986
llama_new_context_with_model: graph splits = 2
extract_frames: llama.cpp built without ffmpeg, processing video files is not supported.
minicpmv_version: 3
<user>What is in the video?
<assistant>The video contains a sequence of images showing a car driving on a road. The car appears to be moving from left to right across the frame, and there are no other vehicles or people visible in the immediate surroundings. The background consists of a blurred landscape with trees and vegetation, suggesting that the car is traveling through a rural or suburban area.

The first image shows the front-left side of the car, while the subsequent images provide a more frontal view as the car continues to move forward. There are no significant changes in the environment or any additional objects introduced throughout the sequence. The overall impression is of a quiet and peaceful drive on a clear day.%

The way I implemented this adds the ffmpeg dependencies as a flag to EVERY binary built in the system, which may be good (as eventually video multimodality is supported by more and more models), but it may just be clutter for the time being. I have mixed feelings on this point.

Overall, I think this is still an improvement, since llama.cpp should not require a dependency on ffmpeg / pkg-config, and this nicely separates that out.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…nabled optionally by setting LLAMA_FFMPEG=1 in call to make.

Galunid · 2024-09-17T17:17:44Z

Could you add information about LLAMA_FFMPEG=1 flag directly to error message? I think it'd be helpful.

EDIT: I meant the flag name (not how to recompile)

HanClinto · 2024-09-17T19:16:22Z

Could you add information about LLAMA_FFMPEG=1 flag directly to error message? I think it'd be helpful.

EDIT: I meant the flag name (not how to recompile)

That's a good idea -- I'll add that tonight -- thanks!

EDIT: Done, thanks! :)

@Galunid

…PEG=1 to help users know exactly how to recompile with video support. Suggestion by @Galunid.

Galunid · 2024-09-18T06:09:25Z

Thanks!

Moving ffmpeg dependency to live behind a compiler flag that can be e…

0d08367

…nabled optionally by setting LLAMA_FFMPEG=1 in call to make.

github-actions bot added examples build labels Sep 17, 2024

HanClinto mentioned this pull request Sep 17, 2024

Support video understanding ggml-org/llama.cpp#9165

Closed

Updating warning message to explicitly include reference to LLAMA_FFM…

c713223

…PEG=1 to help users know exactly how to recompile with video support. Suggestion by @Galunid.

tc-mb merged commit 4bd0c61 into OpenBMB:support-video-understanding Oct 9, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ffmpeg compiler flag for video understanding #32

ffmpeg compiler flag for video understanding #32

HanClinto commented Sep 17, 2024 •

edited

Loading

Galunid commented Sep 17, 2024 •

edited

Loading

HanClinto commented Sep 17, 2024 •

edited

Loading

Galunid commented Sep 18, 2024

ffmpeg compiler flag for video understanding #32

ffmpeg compiler flag for video understanding #32

Conversation

HanClinto commented Sep 17, 2024 • edited Loading

Galunid commented Sep 17, 2024 • edited Loading

HanClinto commented Sep 17, 2024 • edited Loading

Galunid commented Sep 18, 2024

HanClinto commented Sep 17, 2024 •

edited

Loading

Galunid commented Sep 17, 2024 •

edited

Loading

HanClinto commented Sep 17, 2024 •

edited

Loading