Implement whisper.cpp #51

ericcurtin · 2024-08-21T11:09:53Z

If there is a way to auto-detect between language model files and asr model files. We should do that, or if that's not possible we should just use a runtime flag, so some options for the runtime flag would be:

ramalama --runtime vllm
ramalama --runtime llama.cpp
ramalama --runtime whisper.cpp

The text was updated successfully, but these errors were encountered:

ericcurtin · 2024-08-21T11:10:25Z

We added a basic version of whisper.cpp to the Container image here:

#49

rhatdan · 2024-10-14T12:19:19Z

Can this be closed? Do we have this functionality now?

ericcurtin · 2024-10-14T12:48:58Z

There's still more work for both --runtime whisper.cpp and --runtime vllm ... It can be tracked under this issue or somewhere else.

FNGarvin · 2024-12-17T17:26:59Z

What do you see an example command-line looking like here? I've toyed a bit with whisper.cpp and find its use to be very different than launching a gradio server or a chat instance. AFAICT, you're usually calling the main executable with an argument for the model and one or more additional arguments describing the file to transcribe and options.

How would you feel about, as an alternative, dropping users into a [containerized] shell with access to input/output/model volumes and possibly some helper scripts to accomplish simple tasks? So, perhaps, ramalama run --runtime whisper.cpp ggml-large-v3-turbo.bin gets you a bash prompt with a motd saying type 'transcribe /audio/jfk.wav' to blah blah blah etc? Crude compared to running or serving the chat-centric models, but still adds value to Ramalama and also to Whisper (which has had broken CUDA support for nine+ months and isn't as slick wrt pulling images and mounting volumes).

Also, should the container files be pulling the latest whisper.cpp instead of the latest known to be good for Ramalama?

ericcurtin · 2024-12-17T19:08:38Z

I would say just this to start:

ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin jfk.wav

which would just perform a fairly standard whisper.cpp command.

No interactive support, it's not the same as a chat bot workflow.

We have renovate controlling what version of whisper.cpp we build against and it runs everything through CI before we rebase, I'd like to keep this, at least there's a CI run before we suddenly change version. Just cloning main/master without CI, I'd rather not.

ericcurtin · 2024-12-17T19:08:59Z

@p5 kindly set up renovate for us.

FNGarvin · 2024-12-17T19:24:56Z

ramalama --runtime run whisper.cpp ggml-large-v3-turbo.bin

which would just perform a fairly standard whisper.cpp command.

Could you elaborate, please? What command would be performed? Would that be provided through additional command-line arguments to the command you've just given? Would you pass that as though it were a prompt to another model?

The boilerplate syntax for using whisper.cpp directly is something like ./main -m /models/ggml-large-v3-turbo.bin -f /audios/jfk.wav Could you please give me an example of the kind of command-line you're envisioning to complete the same task via Ramalama?

No interactive support, it's not the same as a chat bot workflow.

Sorry to be dense, but I can't tell if you're reiterating that whisper.cpp does not provide interactive support or if you're saying that you do not like the concept of dropping the user to a shell prompt with some workspace-like features.

ericcurtin · 2024-12-17T19:36:02Z

I meant something like this, corrected:

ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin jfk.wav

ericcurtin · 2024-12-17T19:40:54Z

Sorry to be dense, but I can't tell if you're reiterating that whisper.cpp does not provide interactive support or if you're saying that you do not like the concept of dropping the user to a shell prompt with some workspace-like features.

We are always open to ideas. It's just less relevant with whisper.cpp, one doesn't speak to an interactive prompt. But if someone finds that useful for some reason, always happy to look at PRs, etc.

ericcurtin · 2024-12-17T19:45:54Z

This PR is a perfect example of why we don't just clone main/master:

#474

FNGarvin · 2024-12-17T19:57:38Z

Yes, failing each time whisper.cpp is updated is perhaps a better solution than blindly updating along with it. Anyway, back to the meat of my question...

It's just less relevant with whisper.cpp, one doesn't speak to an interactive prompt. But if someone finds that useful for some reason, always happy to look at PRs, etc.

I don't care at all about that. I just saw the seemingly abandoned whisper.cpp stub in the container and thought to make it usable. It is not currently usable, right? Do you care to see it made usable?

ericcurtin · 2024-12-17T20:05:38Z

Yes, failing each time whisper.cpp is updated is perhaps a better solution than blindly updating along with it. Anyway, back to the meat of my question...

It's just less relevant with whisper.cpp, one doesn't speak to an interactive prompt. But if someone finds that useful for some reason, always happy to look at PRs, etc.

I don't care at all about that. I just saw the seemingly abandoned whisper.cpp stub in the container and thought to make it usable. It is not currently usable, right? Do you care to see it made usable?

Yup we sure do, this issue is open for someone to complete it :)

FNGarvin · 2024-12-17T20:14:13Z

this issue is open for someone to complete it :)

Great. I'm just having trouble understanding what "complete it" means to you. That's why I'm asking about the example command-lines you're envisioning.

The boilerplate syntax for using whisper.cpp directly is something like ./main -m /models/ggml-large-v3-turbo.bin -f /audios/jfk.wav Could you please give me an example of the kind of command-line you're envisioning to complete the same task via Ramalama?

ericcurtin · 2024-12-17T20:17:56Z

this issue is open for someone to complete it :)

Great. I'm just having trouble understanding what "complete it" means to you. That's why I'm asking about the example command-lines you're envisioning.

We are open to ideas. Rome wasn't built in a day, one PR at a time.

The boilerplate syntax for using whisper.cpp directly is something like ./main -m /models/ggml-large-v3-turbo.bin -f /audios/jfk.wav Could you please give me an example of the kind of command-line you're envisioning to complete the same task via Ramalama?

Getting this executing the main command you specified would be a start

ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin jfk.wav

FNGarvin · 2024-12-17T20:41:41Z

Thanks. And in the case that Ramalama is running the inferencing in a container, how we get the input media into the model? Would you infer a directory to bind based on the working directory/file given as an argument?

rhatdan · 2024-12-17T21:01:40Z

Yes the wav file would need to be volume mounted into the container with a :z option.

Would it make sense to also allow stdin for a way file?

cat jfk.wav | ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin -

I think it makes sense to allow grab output from stdout.

cat jfk.wav | ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin - > /tmp/output

Not sure what ramalama serve would do?

FNGarvin · 2024-12-17T22:23:51Z

Thank you for helping to bring me up to speed with some mock usage examples. I'm still pretty novice at containers, but could we bind the directory read-only instead of using :z? A dryrun (love that feature, btw) of something like(?):

    podman run --rm -i --device nvidia.com/gpu=all \
    --mount=type=image,src=WHISPER.CPP-MODEL-SHORTNAME,destination=/mnt/models,rw=false,subpath=/models \
    --mount=type=bind,src=/[FQual-dir-of-user-inputfilewav]/,destination=/mnt/audio,rw=false \
    ramalama-image /bin/sh -c "whisper-main -m /mnt/models/model.file -f /mnt/audio/USERFILE.wav"

Would it make sense to also allow stdin for a way file?

It's more appealing in principle than binding directories, but whether or not it is practical is currently unknown to me. It has been a verrrrry long time since I've relied on a shell to create pipes for large data and I don't quite remember what the gotchas were, but I'm thinking there were at least a few. I know even less about what might happen trying to pipe potentially massive, uncompressed audio files into the container on STDIN.

I think it makes sense to allow grab output from stdout.

Redirects aren't adequate?

Not sure what ramalama serve would do?

If the documentation on the whisper.cpp github is accurate, it seems like the provided web service is very basic. All the examples are making requests via curl from the command-line. I totally get it and that isn't criticism, but we're not talking about a convenient UI AFAICT. I am aware that there exist some third-party front-ends, like https://github.com/litongjava/whisper-cpp-server + https://github.com/litongjava/listen-know-web, but I don't really know anything about them.

For me, personally, the most likely use-case for whisper.cpp is in generating subtitles for arbitrary videos (upon audio extracted w/ ffmpeg) or for transcribing and translating arbitrary audio sequences. Command-line invocation over a bound directory seems adequate and possibly ideal. It probably doesn't require any code overlap with whisper.cpp, using Ramalama as scaffolding to bring all the pieces together and offering a layer of abstraction wrt GPU config, system libraries, etc. But I'm not deep into any of these techs or projects, so if there's a better vision / direction I'd love to hear about it.

rhatdan · 2024-12-17T22:35:47Z

The stdout stuff should just work, we could even just have whisper grab /dev/stdin when it sees the "-".

The volume mount can also be marked ro,z so it is both readonly and available for reading based on SELinux.

rhatdan · 2024-12-17T22:37:37Z

podman run --rm -i --device nvidia.com/gpu=all \
--mount=type=image,src=WHISPER.CPP-MODEL-SHORTNAME,destination=/mnt/models,rw=false,subpath=/models \
--mount=type=bind,src=/[FQual-dir-of-user-inputfilewav]/,destination=/mnt/audio/USERFILE.wav,rw=false,z \
ramalama-image /bin/sh -c "whisper-main -m /mnt/models/model.file -f /mnt/audio/USERFILE.wav"

rhatdan · 2024-12-17T22:39:49Z

Slightly simpler.

podman run --rm -i --device nvidia.com/gpu=all
--mount=type=image,src=WHISPER.CPP-MODEL-SHORTNAME,destination=/mnt/models,rw=false,subpath=/models
-v/[FQual-dir-of-user-inputfilewav]/:/mnt/audio/USERFILE.wav:ro,z
ramalama-image /bin/sh -c "whisper-main -m /mnt/models/model.file -f /mnt/audio/USERFILE.wav"

rhatdan · 2024-12-17T22:45:27Z

Looks like whisper -f - is supported now, although the entire file needs to be flushed through the pipe I believe.

FNGarvin · 2024-12-17T23:39:51Z

Looks like whisper -f - is supported

It does look that way, though I haven't prepared any large wav files to test with. Thank you - your way is much better, I think, than binding a directory for a process that will only require one input. Especially if it creates the possibility of chaining ffmpeg without intermediate conversion files.

So, consensus seems to be that cat jfk.wav | ramalama --runtime whisper.cpp run ggml-tiny.bin should produce and run something vaguely like podman run --rm -i --device nvidia.com/gpu=all --mount=type=bind,src=models/,destination=/mnt/models,rw=false ramalama /bin/sh -c "whisper-main -m /mnt/models/ggml-tiny.bin -f -" That doesn't seem too difficult at first blush. I'll see what I can do in the coming days.

Thanks

ericcurtin · 2024-12-18T10:47:03Z

stdin seems fine to me. Also we can autodetect when there is stdin coming in, it's better for usability, you don't need the explicit '-' then, although we can have the ability to explicitly use stdin also '-', grep is an example of a command that does this and llama-run. Once llama-run is integrated into RamaLama, I plan on adding this to the ramalama run:

git diff | ramalama run granite-code "Write a git commit message for this change"

llm-gguf and ollama have this feature.

rhatdan · 2024-12-18T12:36:16Z

Cool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement whisper.cpp #51

Implement whisper.cpp #51

ericcurtin commented Aug 21, 2024 •

edited

Loading

ericcurtin commented Aug 21, 2024

rhatdan commented Oct 14, 2024

ericcurtin commented Oct 14, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024 •

edited

Loading

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024 •

edited

Loading

rhatdan commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

rhatdan commented Dec 17, 2024

rhatdan commented Dec 17, 2024 •

edited

Loading

rhatdan commented Dec 17, 2024

rhatdan commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 18, 2024 •

edited

Loading

rhatdan commented Dec 18, 2024

Implement whisper.cpp #51

Implement whisper.cpp #51

Comments

ericcurtin commented Aug 21, 2024 • edited Loading

ericcurtin commented Aug 21, 2024

rhatdan commented Oct 14, 2024

ericcurtin commented Oct 14, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024 • edited Loading

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 17, 2024

FNGarvin commented Dec 17, 2024 • edited Loading

rhatdan commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

rhatdan commented Dec 17, 2024

rhatdan commented Dec 17, 2024 • edited Loading

rhatdan commented Dec 17, 2024

rhatdan commented Dec 17, 2024

FNGarvin commented Dec 17, 2024

ericcurtin commented Dec 18, 2024 • edited Loading

rhatdan commented Dec 18, 2024

ericcurtin commented Aug 21, 2024 •

edited

Loading

ericcurtin commented Dec 17, 2024 •

edited

Loading

FNGarvin commented Dec 17, 2024 •

edited

Loading

rhatdan commented Dec 17, 2024 •

edited

Loading

ericcurtin commented Dec 18, 2024 •

edited

Loading