Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Transcription Recording only with VAD #49

Open
shahzaib78631 opened this issue Nov 7, 2024 · 4 comments
Open

No Transcription Recording only with VAD #49

shahzaib78631 opened this issue Nov 7, 2024 · 4 comments

Comments

@shahzaib78631
Copy link

Hello Everyone 🤗.
I have been working on an app where i need to implement VAD (Voice Activity Detection) wheen user starts speaking 🗣️ i want to record the audio and send it to the server. I found this package which is able to transcribe when user starts speaking but instead of transcript what i want is the audio only. I have also read about the recording property in the package but it supports only Android 13+. Is there any way to only get the audio with VAD on lower android versions too?

@jamsch
Copy link
Owner

jamsch commented Nov 7, 2024

Hey @shahzaib78631, unfortunately the Android 12 and lower SpeechRecognition APIs don't allow us to use a custom microphone source for the recognition. I may be able to use a workaround to at least get the underlying recognized buffers, however even if I do implement this, Android 12 and lower don't support continuous speech recognition which is probably a requirement here if you're building a communications app.

This library is probably not the best fit for VAD either if you're not interested in the voice transcripts (due to the tight integration with the underlying APIs). I'd rather consider looking in to the following libraries:

Each of these libraries give you a way to access real time frame chunks which you can use to check if there's voice activity. Perhaps implementing a gain filter could be appropriate for your use cases. Otherwise you may want to use a model to process those frames.

@shahzaib78631
Copy link
Author

Hey @jamsch,

Thank you for the detailed explanation and quick response. For now if possible and not a time consuming task could you provide the workaround to get the underlying recognized buffers ?.

I appreciate your recommendations for alternative libraries and I'll definitely look into them.

Thank You.

@jamsch
Copy link
Owner

jamsch commented Nov 7, 2024

Hey @shahzaib78631, I'm not exactly sure if it's going to even be worth implementing the audio capture workaround for Android 12 and lower due to it not supporting continuous recognition, i.e. you'll have to manually start speech recognition again each time it stops (which will need to happen at least 10 times per minute).

I think for cases like VAD you shouldn't need such a resource intensive process as speech recognition and instead you'd want to opt for something that can process the audio frame (using one of the libraries I mentioned above) and then either applying a gain filter to it (which is generally straightforward), or sending it to an API, or using on-device model like Cobra: https://github.com/Picovoice/cobra

@shahzaib78631
Copy link
Author

Thank You bro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants