Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow decoder to ignore corrupted frames #656

Closed
matanui159 opened this issue Mar 24, 2023 · 17 comments
Closed

Allow decoder to ignore corrupted frames #656

matanui159 opened this issue Mar 24, 2023 · 17 comments

Comments

@matanui159
Copy link

Currently the spec is written that (for example) for the VideoDecoder, as soon as one frame fails to decode the entire decoder becomes unusable. It is not even possible to reconfigure and start at the next key-frame, the decoder must be entirely recreated if you want to continue.

This works fine for controlled scenarios, either using known videos or from a known encoder. However, for arbitrary files provided by users this might be unexpected. Most video players (including the browsers own video element usually) seem to just drop the frame. This will produce garbage until the next keyframe but videos with only a few corrupted frames are surprisingly common. In other APIs like FFmpeg's libavcodec, a decoder can still be used after an error allowing for this use-case.

I was wondering if there was a reason for this, or if there has been considerations to change this behaviour? Maybe behind a decoder configuration option (if certain decoders do not allow this), or make it require reconfiguration or another keyframe?

@aboba
Copy link
Collaborator

aboba commented Mar 24, 2023

Can you cite the text that you believe requires this behavior? The text around flush() and reset() would suggest that a decoder error can be non-terminal. For example, an error decoding an SVC extension layer frame shouldn't prevent decoding of subsequent base layer frames.

@aboba aboba added the Question label Mar 24, 2023
@matanui159
Copy link
Author

In VideoDecoder.decode(chunk) one of the steps is:

If decoding results in an error, queue a task to run the Close VideoDecoder algorithm with EncodingError and return.

Once the decoder is in the "closed" state, both configure() and reset() will fail:

If state is "closed", throw an InvalidStateError.

@dalecurtis
Copy link
Contributor

We could probably clarify that text to differentiate between fatal and non-fatal errors. Unfortunately that's going to vary by implementation. E.g., for security reasons all errors in Chromium's software decoder are fatal.

In general though I don't think the UA should be deciding what's recoverable and what's not. The point of a low level API like this is to defer that decision to the client. If the client wishes to allow playback to continue by resuming from the next keyframe they can respin a decoder and do so. Most professional content sites would not want such behavior since it could hide corpus errors.

For codecs which have built in error resilience, the current text isn't intended to prevent them from doing so. So we could at least clarify that too.

@seflless
Copy link

Does anyone on here have an example video file with one or more corrupted frames?

@seflless
Copy link

I'd love to add one to a test suite.

@dalecurtis
Copy link
Contributor

https://source.chromium.org/chromium/chromium/src/+/main:media/test/data/ has a couple corrupt vp8 frames. In general we generate corrupt frames for testing either through fuzzers or by truncating valid data.

@matanui159
Copy link
Author

E.g., for security reasons all errors in Chromium's software decoder are fatal.

I have since noticed that HTMLVideoElement completely stops working as well once it finds a corrupted frame. Is this relatively new? I recall it working fine after corrupted frames but I might be remembering incorrectly.

Does anyone on here have an example video file with one or more corrupted frames?

The way I tested it was just by opening the video in a hex editor and going to some random point in the file and setting a byte or two to 0x00 (and hoping it doesn't completely break the file, although most of the data is the video anyway).

I'll have a play around with recreating the WebCodecs decoder and waiting for the next keyframe to see how well that works.

@dalecurtis
Copy link
Contributor

Chrome has always stopped on video decoding errors and stops on audio decoding errors as of several years ago.

@matanui159
Copy link
Author

I tried another corrupted video and while WebCodecs still fails on the first corrupted frame, HTMLVideoFrame seems to be able to play through the entire thing just fine?

Also a smaller issue but the in the spec:

If decoding results in an error, queue a task to run the Close VideoDecoder algorithm with EncodingError and return.

Where EncodingError is described as "The encoding operation (either encoded or decoding) failed.".

However for Chrome, which I've been testing on, it instead reports an OperationError (with the message set to "Decoding error.").

@padenot
Copy link
Collaborator

padenot commented Mar 30, 2023

We discussed this during the editor call:

  • What's the difference between resetting a decoder internally and tearing one down and getting a new one, in practice, in terms of duration? This can translate directly in quality of experience (time to recover).
  • If a decoder that has a number of instances that's limited is closed because of an error, can't another app acquire it, and so the web app can't re-create one?
  • Should we have a policy of always making all errors fatal like Chrome's current implementation?

@aboba aboba added agenda Add to Media WG call agenda and removed Question labels Mar 30, 2023
@dalecurtis
Copy link
Contributor

  • I think the depth of reset() varies by implementation. We've certainly had some hardware decoder instances in the past where reset() is insufficient to recover them. From a security perspective you'd want to be sure that all corrupt state is purged, it's unclear without inspecting each platform's implementation that is true. Which from a security pov means it can't be relied upon.
  • On platforms where there are such low limits on codec concurrency, you're also unlikely to have high app concurrency, so I think the risk of another app taking the codec is not worth worrying about.
  • The spec text currently says errors are fatal, so currently implementations are required to error out. What constitutes an error will depend on the codec though -- e.g., built in error resilience may mean no error is surfaced.

@dalecurtis
Copy link
Contributor

I tried another corrupted video and while WebCodecs still fails on the first corrupted frame, HTMLVideoFrame seems to be able to play through the entire thing just fine?

Likely this means the hardware decoder doesn't like the content but the software decoder can decode it okay. Chrome currently doesn't implement fallback for decoding with no-preference -- we were about to but then #604 came up for discussion and we haven't got back to it.

You likely want to use prefer-software for this case if the hardware decoder can't decode it.

Also a smaller issue but the in the spec:

If decoding results in an error, queue a task to run the Close VideoDecoder algorithm with EncodingError and return.

Where EncodingError is described as "The encoding operation (either encoded or decoding) failed.".

However for Chrome, which I've been testing on, it instead reports an OperationError (with the message set to "Decoding error.").

Whoops, filed https://bugs.chromium.org/p/chromium/issues/detail?id=1429352

@aboba
Copy link
Collaborator

aboba commented Apr 11, 2023

How does the application figure out what to do next?

  1. If "prefer-hardware" was used to configure the (now closed) decoder, should the application attempt to construct another decoder and configure it to "prefer-hardware"? This might work if the decoder experienced an error but hardware resources are available.
  2. Or should it attempt to create another decoder and configure it with "prefer-software" (e.g. software failover)? If hardware resources are not available or there is another serious issue (e.g. GPU crash) then Option 1 might fail, and failover to software might be required.

@aboba
Copy link
Collaborator

aboba commented Apr 11, 2023

I am not sure how this issue relates to "error resilient codecs" (e.g. codecs configured with a scalabilityMode, Opus with built-in FEC or external FEC/RTX/RED).

For a codec configured with a scalabiltyMode, the receiver should check dependencies and chains before feeding an encoded chunk to the decoder, to make sure that the chunk will be decodable. If the dependencies are met and the chains are unbroken, the chunk should be decodable even if other (discardable) frames have been lost. So no decoder error should be generated.

For a codec with internal resilience (e.g. Opus with FEC or DRED), we've discussed how to indicate missing frames to the decoder. Presumably, this wouldn't result in a decoder error unless the codec couldn't reconstruct the missing frames.

In the case of external resilience, the frame isn't fed to the decoder until "holes" in the jitter buffer are filled in and the frame has been wholly received or reconstructed. Again, I wouldn't expect a decoder error unless something has gone wrong with external reslience.

@Djuffin
Copy link
Contributor

Djuffin commented Apr 13, 2023

Media WG meeting: we're going to review spec exceptions for encoder() and configure() and see if we should introduce a distinction retryable and non-retryable errors (UA having some internal failures vs corrupted data error).

@chrisn
Copy link
Member

chrisn commented Apr 13, 2023

@chrisn chrisn removed the agenda Add to Media WG call agenda label Jul 6, 2023
@aboba
Copy link
Collaborator

aboba commented Jul 6, 2023

To be resolved via #669

@aboba aboba closed this as completed Jul 6, 2023
@aboba aboba added the Duplicate label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants