JSI function for transcribe audio buffer #52

jhen0409 · 2023-06-07T00:27:23Z

Provide JSI function for transcribe audio buffer, so we can use library like react-native-audio-pcm-stream or from another source, and we can manage recorded audio samples on JS without writing platform specific code.

Compare to native bridge, JSI can convert buffer from JS in high performance.

simonwh · 2024-07-03T09:27:25Z

Any progress on this one? :)

deeeed · 2024-12-10T05:07:23Z

Hi @jhen0409 ,

After implementing non-live transcription in my audio playground, I'd like to discuss approaches for live transcription integration between expo-audio-stream and whisper.rn. I see two main paths:

JavaScript-Level Integration (Currently Implemented)
Approach:

Using expo-audio-stream's base64 PCM data stream
Interfacing with whisper.rn's transcribeData API
Managing buffering and transcription state in JavaScript
Pros:
Simpler to implement initially
More flexible for different use cases
Platform-agnostic implementation
Cons:
Multiple base64 conversions
Higher memory usage
JavaScript bridge overhead

Native-Level Integration (Proposed)
Approach:

Direct PCM data handling at native layer
Add new native methods in whisper.rn for streaming PCM
Implement efficient buffer management between expo-audio-stream and whisper.rn
Implementation Suggestion:
Add startRealtimeTranscribeWithAudioInput method to handle streaming setup
Implement receiveAudioDataChunk for direct PCM data processing
Connect directly to whisper.cpp's audio processing pipeline
Use circular buffers for efficient memory management

Questions:

Would you be open to adding these new streaming-focused methods to whisper.rn? This would allow direct PCM handling without base64 conversion overhead.
Should we extend the existing realtime API or create a new pathway specifically for external audio sources?

I believe the native integration path would provide better performance for real-time use cases, but I'd appreciate your thoughts on this approach. Happy to contribute PRs once we align on the best path forward.

Looking forward to your feedback!

jhen0409 · 2024-12-12T02:43:18Z

Hi @jhen0409 ,

After implementing non-live transcription in my audio playground, I'd like to discuss approaches for live transcription integration between expo-audio-stream and whisper.rn. I see two main paths:

JavaScript-Level Integration (Currently Implemented)
Approach:

Using expo-audio-stream's base64 PCM data stream

Interfacing with whisper.rn's transcribeData API

Managing buffering and transcription state in JavaScript
Pros:

Simpler to implement initially

More flexible for different use cases

Platform-agnostic implementation
Cons:

Multiple base64 conversions

Higher memory usage

JavaScript bridge overhead

Native-Level Integration (Proposed)
Approach:

Direct PCM data handling at native layer

Add new native methods in whisper.rn for streaming PCM

Implement efficient buffer management between expo-audio-stream and whisper.rn
Implementation Suggestion:

Add startRealtimeTranscribeWithAudioInput method to handle streaming setup

Implement receiveAudioDataChunk for direct PCM data processing

Connect directly to whisper.cpp's audio processing pipeline

Use circular buffers for efficient memory management

Questions:

Would you be open to adding these new streaming-focused methods to whisper.rn? This would allow direct PCM handling without base64 conversion overhead.

Should we extend the existing realtime API or create a new pathway specifically for external audio sources?

I believe the native integration path would provide better performance for real-time use cases, but I'd appreciate your thoughts on this approach. Happy to contribute PRs once we align on the best path forward.

Looking forward to your feedback!

I'd like to extend current transcribeRealtime implementation to support other audio sources, it may like this:

transcribeRealtime({
  /** [NEW option] Choose audio source (custom: put by yourself in JS or native side */
  source: 'built-in' | 'custom',
  // ...
}): Promise<{
  /** Stop the realtime transcribe */
  stop: () => Promise<void>
  /** Subscribe to realtime transcribe events */
  subscribe: (callback: (event: TranscribeRealtimeEvent) => void) => void
  /** [NEW method] Put audio buffer (Buffer or base64 encoded string) for `custom` source */
  pushAudioDataChunk: (data) => void
}>

The pushAudioDataChunk native method will implement by cpp/JSI directly, it may use the same method with react-native-blob-jsi-helper. It would be better if we could move context pool & jobs into JSI, so we don't need to have to use Blob module and avoid JNI costs on Android, but that would probably a big refactor.

Also, we can expose a static method for put audio data to a realtime-transcription job in the native side, so that can use by a custom audio stream native module.

For the transcribeData method, we will also support array buffer by use the same method with blob-jsi-helper, this is the main purpose in this issue.

jhen0409 added the enhancement New feature or request label Jun 7, 2023

jhen0409 added this to the v0.4 milestone Jun 7, 2023

jhen0409 mentioned this issue Oct 12, 2023

Unify platform code for store audio buffers #144

Closed

jhen0409 mentioned this issue Dec 9, 2023

feat(cpp): unify some platform code (audio slices, utils, ...) #166

Merged

6 tasks

This was referenced Nov 6, 2024

feat: support wav base64 for transcribe & add transcribeData for no head data #267

Merged

Provide wav data directly? #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSI function for transcribe audio buffer #52

JSI function for transcribe audio buffer #52

jhen0409 commented Jun 7, 2023

simonwh commented Jul 3, 2024

deeeed commented Dec 10, 2024 •

edited

Loading

jhen0409 commented Dec 12, 2024

JSI function for transcribe audio buffer #52

JSI function for transcribe audio buffer #52

Comments

jhen0409 commented Jun 7, 2023

simonwh commented Jul 3, 2024

deeeed commented Dec 10, 2024 • edited Loading

jhen0409 commented Dec 12, 2024

deeeed commented Dec 10, 2024 •

edited

Loading