Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use
xav
instead offfmpeg
#403base: main
Are you sure you want to change the base?
use
xav
instead offfmpeg
#403Changes from 1 commit
1b6e128
7b1c02f
fd81786
fc45ff5
d882343
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function just needs to return a stream of chunks, so we don't need to do this concatenation.
Do you know what determines the length of each chunk, could that be configurable perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure but i'm guessing that a chunk/frame in a video context is the audio duration of one frame. When i remove the
Stream.chunk_every/2
pipeline i get a massive performance degradation and the processing does not finish in a reasonable time. I imagine that it's far more efficient to read a chunk of frames first, before sending it to the Whisper serving.I just reused the implementation of the Elixir WebRTC team from https://github.com/elixir-webrtc/xav/blob/master/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The serving transforms the stream to accumulate smaller chunks, but there is a place where we need to append to a list and that may be the reason why it's inefficient with tiny chunks.
However, either way, I think it's wasteful to convert every frame to a tensor just to concatenate later. With the current ffmpeg code we get a single binary for the whole 30s and create a tensor from that. So ideally we want to replicate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this suffice?
The 1000 chunks is currently arbitrary because we don't know the frame_size of the used codec. This could be added to
xav
i thinkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: The information is already there.
Xav.Frame
has asamples
field from which we could calculate the 30s chunks which would be calculated byThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling the frame binaries directly is a good call. I think we can transform the stream, so that we can accumulate the binary, instead of waiting for 1000 binaries and joining then. I was thinking we can determine the number of samples from binary size, but
frame.samples
is even more convenient.So it would be something like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! That is a great solution! Thanks!