From cbbb91e6a39fbed9d9cf7a102366547ae7de1da8 Mon Sep 17 00:00:00 2001 From: Jhen Date: Tue, 3 Oct 2023 09:13:27 +0800 Subject: [PATCH] docs(tips): add vad section --- docs/TIPS.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/TIPS.md b/docs/TIPS.md index bdd4672..fabec9a 100644 --- a/docs/TIPS.md +++ b/docs/TIPS.md @@ -28,6 +28,22 @@ The default `realtimeAudioSec` value of TranscribeOptions is `30` (seconds). If However, setting slice might result in truncated words, which is not ideal. In the future, we plan to use audio processing tricks like pitch detection to dynamically adjust the timing of slices. Further details are provided in the next section. +## transcribeRealtime: Use Voice Activity Detection (VAD) + +In recording, you can use VAD (option: `useVad`) to detect voice activity to determine when to start transcribing. This can help in some situations, like avoid high CPU usage, or avoid the unnecessary transcribe events trigger often. + +Currently the VAD implementation is simply using `vad_simple` from whisper.cpp. If you want to quickly test how it performs, you can try `stream` example from whisper.cpp: + +```bash +git clone https://github.com/ggerganov/whisper.cpp +cd whisper.cpp +./models/download-ggml-model.sh base +make -j +./stream -m ./models/ggml-base.bin +``` + +It is currently disabled by default (useVad: false). We will use it for a while to decide whether it should be enabled by default. + ## transcribeRealtime: Stop recording by audio processing (Work in Progress) For instance, you might want to stop recording when a specific audio pitch is detected.