-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core APIs do not demux, and the stream_index
parameter has (almost) no effect
#476
Comments
While digging into this, I actually wrote PR #481 to get my head around the logic of the core decoding loop. The core decoding loop in
What took me some time to realize is that we expect steps 1 and 2 to fail the first time through the loop. Most examples start with reading packets, decoding them, and then getting out a frame. Certainly that's the order of what must happen, but we invert that logic. Some thoughts:
|
I noticed that too. When I asked @ahmadsharif1 why it was done in this order, he said that it's because the frame we want may be in a packet that we already sent to the decoder. In that case, we don't want to re-send a new packet, because this is wasteful. I was surprised too at first because this isn't how examples I've seen are written, but this seems to makes sense to me.
We could but that wouldn't be efficent. We call the filter function on an AVFrame, but the stream index is known as soon as we get the packet. If we were to demux within the filter function that mean we would have to decode all the frames, including those that aren't from the stream we want. I think we'll want to demux at the packet level instead, so that we can avoid decoding the frames that aren't form the targeted stream.
I think we can, even in a BC-way: most of our APIs are [supposed to be] stream-specific, and those who aren't are still wrong anyway, in the sense that they won't be returning frames from any active stream. So it would be a bugfix in all cases. In terms of implementation I'm hoping this should be as simple as filtering the AVPacket by stream index in the main decoding loop. |
Alternative title: The C++ and core ops work fine as long as we add only one stream. They break if we add more than one stream.
Example 1:
Example 2:
None of the core APIs or C++ APIs actually do demuxing. I.e. the
stream_index
parameter is never used to filter and select frames. The only way it is used is to seek.This may be more clear by looking at the call-stack of our decoding entry-points.
All but one rely on
getFrameAtIndexInternal
, which will use thestreamIndex
to set the cursor:torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 1254 to 1255 in 288bb83
but then immediately return the frame that is returned by
getNextFrameNoDemuxInternal()
, which doesn't demux anything.The text was updated successfully, but these errors were encountered: