Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava-cli: format batch --image descriptions according to --template #8637

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

themanyone
Copy link

@themanyone themanyone commented Jul 22, 2024

Problem. Using llava-cli --image 1.jpg --image 2.jpg... batch mode generates several image descriptions in succession. Keeping the model in memory allows for faster generation. But the output format is not eminently useful. To wit, the image file name is not mentioned.

Rationale. Expediency could have chosen a specific data output format, such as JSON. But keeping with llama.cpp's versatility with command-line options, it seemed reasonable to let the user specify their own data format.

Improvisation. This pull request introduces a optional --template argument to format output of bulk image descriptions. If --template is not supplied, output is exactly as it was before, so this commit is atomic.

Help screen. The following line is added to the -h help message.

--template STRING output template replaces [image] and [description] with generated output

Prerequisites. For this example, we create a shell script, describe.sh, to launch any particular llava model and options (yours will be different).

llama-llava-cli -ngl 16 \
-m ~/.local/share/models/Obsidian/obsidian-q6.gguf \
--mmproj  ~/.local/share/models/Obsidian/mmproj-obsidian-f16.gguf \
-c 4096 "$@"

Next, we cd to a directory containing a few images. And demonstrate using this new --template option!

shopt -s nullglob
cd Pictures
printf -- "--image %q " *.png *.webm *.jpg *.jpeg | describe.sh -p "Write a one paragraph caption for the image."  --template '<figure><img src="[image]" alt="[image]"><figcaption>[description]</figcaption></figure>' --log-disable | tee data

The printf %q outputs file names with spaces and special characters properly escaped. We could have used find. The nullglob option to shopt is necessary to prevent bash from causing errors. If no images are found matching [pattern], it tries to pass off the glob pattern itself as one of the images. So we turn that feature off.

Photos are processed one by one, formatting the output according to the

template we provided. We now have a data file that looks like this.

<figure><img src="test pattern.png" alt="test pattern.png"><figcaption> The colorful television screen displays the image of a fish tank with blue, red, yellow, green, and blue elements.

</figcaption></figure><figure><img src="trading patterns.png" alt="trading patterns.png"><figcaption> A computer monitor displaying a variety of graphs and diagrams.

</figcaption></figure><figure><img src="Youtube-button.png" alt="Youtube-button.png"><figcaption> The YouTube logo is red and white.

</figcaption></figure><figure><img src="20230218_215924.jpg" alt="20230218_215924.jpg"><figcaption> A small digital scale shows the number 378.

</figcaption></figure><figure><img src="dad.jpg" alt="dad.jpg"><figcaption> A person plays the grand piano in an exhibition hall.

</figcaption></figure><figure><img src="ferry.jpg" alt="ferry.jpg"><figcaption> A boat is docked at a port near a forest.

</figcaption></figure><figure><img src="github_error.jpg" alt="github_error.jpg"><figcaption> The image shows a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot</figcaption></figure>

We could have cleaned it up to make a proper HTML page. But tools like HTML tidy already exist for that.

tidy -i -o album.html data

<!DOCTYPE html>
<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Linux version 5.8.0">
  <title></title>
</head>
<body>
  <figure>
    <img src="test%20pattern.png" alt="test pattern.png">
    <figcaption>
      The colorful television screen displays the image of a fish
      tank with blue, red, yellow, green, and blue elements.
    </figcaption>
  </figure>
  <figure>
    <img src="trading%20patterns.png" alt="trading patterns.png">
    <figcaption>
      A computer monitor displaying a variety of graphs and
      diagrams.
    </figcaption>
  </figure>
...

As you can see, the new --template feature makes the AI web creation much easier.

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants