Add OneDNN or DirectML support #2303

thewh1teagle · 2024-07-15T12:22:29Z

Currently the best results we can get with whisper.cpp is with Cuda (Nvidia) or CoreML (macOS).

On Windows there's only OpenBlas and it works slow, maybe 2 times of the duration of the audio (amd ryzen 5 4500u, medium model).
When using ctranslate2 on the same machine it works 2-3 times faster than the audio duration on CPU only!

Since recently whisper.cpp removed support for OpenCL, I think that it's important having good alternative to Windows users with Intel / AMD CPUs / TPUs.

There's few different options that can be added:
oneDNN-ExecutionProvider.html
DirectML-ExecutionProvider.html

In addition ctranslate2 uses ruy

Related: ggerganov/ggml#406 (comment)

WilliamTambellini · 2024-07-31T21:10:21Z

+1 for oneDNN

WilliamTambellini · 2024-07-31T21:11:25Z

ggerganov/ggml#855

WilliamTambellini · 2024-07-31T21:12:22Z

cf @rfsaliev

thewh1teagle · 2024-08-01T19:18:24Z

Update:
meanwhile I'm sticking to release v1.6.2 which still have OpenCL support.
Otherwise as I said the speed is too much slow and not usable (2-5x time more than the audio duration).

Now with OpenCL it takes 40s to transcribe 47s audio on the same normal TPU hardware (amd ryzen5 4500u)

By the way there was weird issues with OpenCL that prevent it from work. the solution I found is to set CMAKE_BUILD_TYPE to RelWithDebInfo

Osiris-Team · 2024-10-16T16:24:16Z

Any updates on this?
I am trying to run this with GPU acceleration on Windows with AMD GPU.

thewh1teagle · 2024-10-16T16:29:03Z

@Osiris-Team

Use vulkan it works fast with amd glu on all platforms. You can try with the app vibe

Osiris-Team · 2024-10-16T16:35:47Z

@thewh1teagle thx that looks great, giving it a try!
Can I select a custom model since I already got one downloaded?
I assume its running whisper.cpp under the hood and thus maybe has the same format (ggml)? - It does.

Osiris-Team · 2024-10-16T18:03:48Z

@thewh1teagle There is a good minute at the start where it says "Transcribing - 0%", thought it wasn't working (no cpu/gpu/io activity), maybe adding some more detailed logging in whatever is happening at the beginning helps here.

thewh1teagle · 2024-10-16T18:06:48Z

@Osiris-Team

You can take a look in the docs of the repository in debug.md.
Also you can open new issue

thewh1teagle mentioned this issue Jul 15, 2024

[Feature Request]: Boost speed on Windows thewh1teagle/vibe#180

Closed

thewh1teagle changed the title ~~Add OneDNN support~~ Add OneDNN or DirectML support Jul 15, 2024

thewh1teagle mentioned this issue Aug 1, 2024

When using GPU (OpenCL), the reply speed is slower and all replies are incorrect？？ ggerganov/llama.cpp#7661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OneDNN or DirectML support #2303

Add OneDNN or DirectML support #2303

thewh1teagle commented Jul 15, 2024 •

edited

Loading

WilliamTambellini commented Jul 31, 2024

WilliamTambellini commented Jul 31, 2024

WilliamTambellini commented Jul 31, 2024

thewh1teagle commented Aug 1, 2024

Osiris-Team commented Oct 16, 2024

thewh1teagle commented Oct 16, 2024

Osiris-Team commented Oct 16, 2024 •

edited

Loading

Osiris-Team commented Oct 16, 2024

thewh1teagle commented Oct 16, 2024

Add OneDNN or DirectML support #2303

Add OneDNN or DirectML support #2303

Comments

thewh1teagle commented Jul 15, 2024 • edited Loading

WilliamTambellini commented Jul 31, 2024

WilliamTambellini commented Jul 31, 2024

WilliamTambellini commented Jul 31, 2024

thewh1teagle commented Aug 1, 2024

Osiris-Team commented Oct 16, 2024

thewh1teagle commented Oct 16, 2024

Osiris-Team commented Oct 16, 2024 • edited Loading

Osiris-Team commented Oct 16, 2024

thewh1teagle commented Oct 16, 2024

thewh1teagle commented Jul 15, 2024 •

edited

Loading

Osiris-Team commented Oct 16, 2024 •

edited

Loading