Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is exposing https://w3c.github.io/webcodecs/#enumdef-hardwareacceleration a good idea #239

Closed
youennf opened this issue May 11, 2021 · 82 comments · Fixed by #313
Closed
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior). editorial changes to wording, grammar, etc that don't modify the intended behavior

Comments

@youennf
Copy link
Contributor

youennf commented May 11, 2021

I am wondering what HardwareAcceleration is supposed to be used for.

One potential use would be to always prefer power efficiency. But power efficiency does not mandate hardware acceleration.
Depending on the device, the codec, the resolution, software based codecs might be better suited. It is unclear how a web developer will be able to select hardwareAcceleration for that case except to let the UA decide with 'allow'.

Another possibility is to maximise compatibility and use 'deny'. In that case though, it means that web developer looses power efficiency in a lot of cases. A careful web developer will then probably want to enter the business of identifying which SW and HW codecs are in use on a device. This does not seem great and somehow contradicts the desire to not increase fingerprinting.

It seems UA is in general the best entity to decide what to use at any given point.
Instead of hard requirements, a web application could look at providing hints, though it is true hints tend to be difficult to define and implement consistently.

It also seems HardwareAcceleration is a potential fingerprinting vector though it is not marked as so in the spec.

@chcunningham
Copy link
Collaborator

chcunningham commented May 11, 2021

I am wondering what HardwareAcceleration is supposed to be used for.

This is a feature request from several sophisticated apps that bring their own encoders/decoders implemented in WASM (with customized features and tuning). Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

It is unclear how a web developer will be able to select hardwareAcceleration for that case except to let the UA decide with 'allow'.

What part is unclear? allow works as you've described, and is the default value.

This does not seem great and somehow contradicts the desire to not increase fingerprinting.

Fingerprinting and codec features are often add odds. In such cases, including this one, we strive to offer the features folks are demanding without offering any extra information (least power principle). This is why we have reduced the signal to "hardware accelerated", as opposed to exposing the name of the hardware or similar.

It also seems HardwareAcceleration is a potential fingerprinting vector though it is not marked as so in the spec.

Agree, I think that's a good thing for us to call out.

@chcunningham chcunningham added the breaking Interface changes that would break current usage (producing errors or undesired behavior). label May 12, 2021
@chcunningham
Copy link
Collaborator

chcunningham commented May 12, 2021

Triage note: marking 'breaking', as removal of this attribute would clearly break behavior for folks that have come to rely on it. This is purely a triage note; I am opposed to actually implementing this break, as the feature is directly requested by users with reasons cited above.

Also marking 'editorial' for the requested additions to privacy considerations.

@chcunningham chcunningham added the editorial changes to wording, grammar, etc that don't modify the intended behavior label May 12, 2021
@chcunningham
Copy link
Collaborator

@youennf, pls see above. If nothing more to discuss I'd like to close.

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

Thanks for pinging me.
In general, when an API increases fingerprinting, we need a really clear usecase, the bar should be really high.
I'd like to understand why media capabilities is not good enough and if this API meets this high bar.

Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

These applications can probably use whether encoding/decoding is powerEfficient through media capabilities for their configuration as a good enough approximation.
This makes me thing this field is more of a v2 optional feature than a must-have feature.

Also, this usecase is about requiring hardware, while the API is allowing to require software. Is there a usecase for that other part?

As a side note, this potentially forbids some OS strategies like switching between software/hardware depending on various factors (other apps using HW, battery status...). Or it could force OSes to lie to web applications.

If we really want such an API, I would go with MediaCapabilities to let the application decide whether it wants OS codecs or its own codec. If the web application wants OS codecs, a hint API instead of a must-use API seems more appropriate since it would not cause fingerprinting.

@sandersdan
Copy link
Contributor

These applications can probably use whether encoding/decoding is powerEfficient through media capabilities for their configuration as a good enough approximation.

There are several things that may be assumed about a hardware codec (none of which are guaranteed by WebCodecs):

  • Power efficient operation
  • Reduced CPU usage
  • Output frames are in GPU memory
  • More likely to be strict, less likely to recover from errors
  • May be restrictive, bounded, or even in some cases unreliable

My understanding of the use case that Chris outlined above (a media player application) is that the goal is to take the efficiency (first three points above) when it's available, but to fully control the fallback software path for consistent behavior. powerEfficent may be a useful-enough signal for this case.

Also, this usecase is about requiring hardware, while the API is allowing to require software. Is there a usecase for that other part?

Yes, it's also common for applications to use hardware optimistically, but to prefer software if it is determined that hardware does not meed the application's requirements (last two points above). This has been historically difficult for UAs to determine, so much so that there have been proposals to allow WebRTC servers to request disabling of hardware acceleration on a per-connection basis.

This makes me thing this field is more of a v2 optional feature than a must-have feature.

This was one of the first requests ever made by a partner for WebCodecs. It's probably not the most important WebCodecs feature, but I don't consider it trivial either.

As a side note, this potentially forbids some OS strategies like switching between software/hardware depending on various factors (other apps using HW, battery status...). Or it could force OSes to lie to web applications.

This is true. Applications should not be setting a value for this property if they don't want to restrict the implementation.

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

Yes, it's also common for applications to use hardware optimistically, but to prefer software if it is determined that hardware does not meed the application's requirements (last two points above)

But then, how is the web page knowing that a hardware encoder is good enough/not good enough in terms of compat?
It seems we would need to leak the actual HW chipset ID and the actual SW library ID and version for the application to do a good job there, which is something we do not want to do for fingerprinting reasons.

@sandersdan
Copy link
Contributor

But then, how is the web page knowing that a hardware encoder is good enough/not good enough in terms of compat?

Typically by trying hardware first, measuring performance, and monitoring for any errors with their particular content and requirements.

The key thing is to be able to act on that information once they have gathered it.

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

Typically by trying hardware first, measuring performance, and monitoring for any errors with their particular content and requirements.

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

It seems this is only useful in the case the application wants to do the following:

  • Try hardware, if good use it. If not good, go to next step.
  • Try software, if good use it. If not good, go to next step.
  • Use JS library implementation (needed anyway since some platforms only support HW codecs).
    This flow seems like an edge case and I do not see the necessity to support it in v1.
    A hint would work equally well, without the fingerprinting concerns.

@sandersdan
Copy link
Contributor

sandersdan commented May 25, 2021

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't follow, without the field there isn't a way to forcefully fall back. Most applications won't have a WASM fallback.

(needed anyway since some platforms only support HW codecs)

I think many WebRTC-style applications would choose to switch to a reduced quality/feature mode if the WebCodecs codecs were deemed inadequate and there was no alternative available.

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

To summarise, the main usecase of this property is for applications to force SW code path.
For applications wanting to use HW, powerEfficient might be good enough, at least in the short term.

Can you clarify the use case of such applications, in particular those applications that would do monitoring but would not have a fallback?
My understanding was that these applications wanted to protect themselves from codec UA/OS bugs.
But such apps need to have a fallback in that case, so it is probably something else.

Again, given this is a potential new fingerprinting vector, the bar should be high.
It should allow to unlock new features to apps, without known viable alternatives.
Do you know what was PING WG assessment of this particular field?

@dalecurtis
Copy link
Contributor

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector. You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

@sandersdan
Copy link
Contributor

To summarise, the main usecase of this property is for applications to force SW code path.

I push back lightly on this characterization, while powerEfficient might be a substitute it doesn't eliminate the use case.

Can you clarify the use case of such applications, in particular those applications that would do monitoring but would not have a fallback?

Sure. Some things applications may be monitoring include:

  • Throughput
  • Latency and jitter
  • Rate control accuracy (encode only)
  • Picture quality (probably encode only)
  • Actual failures

These may be things that are inherent to the platform codecs or they may be things that vary depending on system load. WebRTC-style applications are likely to use resolution, bitrate, codec, and profile settings as a first line of defense. In cases where that is inadequate (eg. because jitter is just too high at any setting), forcing software codecs can be a reliable workaround.

In the case of actual failures, the cause may be UA/OS bugs, or it may be non-conformant streams. In either case it is likely that a software codec will be more reliable.

Do you know what was PING WG assessment of this particular field?

I will defer to @chcunningham for this question.

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector.

I think we all agree we want to go to a world where we mitigate-then-remove those issues.
We certainly do not want to make fingerprinting easier and more robust.

You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

How do you force a video element to use either the SW or the HW decoder at a given fixed resolution?
What about encoders?

@youennf
Copy link
Contributor Author

youennf commented May 25, 2021

Some things applications may be monitoring include:

  • Throughput
  • Latency and jitter
  • Rate control accuracy (encode only)
  • Picture quality (probably encode only)

For those things, I fail to understand the relationship with the HW acceleration field.
Media capabilities give you already that information using MediaDecodingType/MediaEncodingType.
Maybe what you want is to pass a MediaDecodingType/MediaEncodingType when creating the encoder/decoder.
Then the OS will select its most suitable codec alternative according that value.
This answers the problem you are describing in a more straightforward manner without the fingerprinting hurdles.

FWIW, I know OSes that have more than one SW encoder of a given codec. A single boolean is not sufficient to enumerate them all.

@sandersdan
Copy link
Contributor

Media capabilities give you already that information using MediaDecodingType/MediaEncodingType.

It doesn't, nor could it reliably do so. It can guess at a subset, but even for those they vary by system load, configuration, and content.

Then the OS will select its most suitable codec alternative according that value.

This is potentially possible but is delving into trying to guess what applications want. For example Chrome already avoids hardware decode for WebRTC on Windows 7 due to high latency, but we can't really know every application's detailed preferences well enough to implement a generic selection algorithm.

WebCodecs also operates at a low-enough level that things like dynamic codec switching are unlikely to 100% reliable, so the application will need to be involved in the algorithm in some direct way.

FWIW, I know OSes that have more than one SW encoder of a given codec. A single boolean is not sufficient to enumerate them all.

This is true. We didn't see much advantage with full enumeration, and the fingerprinting concerns are much larger with an API like that.

@dalecurtis
Copy link
Contributor

dalecurtis commented May 25, 2021

I think we all agree we want to go to a world where we mitigate-then-remove those issues.
We certainly do not want to make fingerprinting easier and more robust.

Agreed, but AFAIK, the only mitigation possible is restricting when a hardware codec is used. E.g., requiring N frames before a hardware codec kicks in and/or limiting hardware codec usage to high-trust modes. You could apply both such restrictions to the proposed property. E.g., always return false unless trust requirements are satisfied.

Keep in mind that today all browsers expose the hardware decoding value through MediaCapabilities' powerEfficient value. Here's Safari's for VP9: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/graphics/cocoa/VP9UtilitiesCocoa.mm#L256

You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

How do you force a video element to use either the SW or the HW decoder at a given fixed resolution?

AFAIK most browsers use a simple resolution filter (see above), so it's a matter of finding the cut-offs used by each browser.

What about encoders?

MediaRecorder's total encode time will expose a hardware encoder versus a software encoder entirely on the client. A more sophisticated client can use a WebRTC loopback or server setup to figure this out similarly.

@youennf
Copy link
Contributor Author

youennf commented Jun 1, 2021

Keep in mind that today all browsers expose the hardware decoding value

Not really, Media capabilities is exposing whether it is ok for battery life to use those settings. UAs can implement heuristics in various ways. Hardware acceleration is an (important) implementation detail that is used for that 'battery-life-friendly' feature.

Exposing features is ok, exposing implementation strategies does not look appealing.

AFAIK most browsers use a simple resolution filter (see above), so it's a matter of finding the cut-offs used by each browser.

Web pages can try to detect at which resolution OSes might switch from SW to HW.
Web pages cannot currently force a HW decoder at a given fixed resolution.

MediaRecorder's total encode time will expose a hardware encoder versus a software encoder entirely on the client.

It really depends whether the UA is fine exposing this information or not.
MediaRecorder spec allows to delay events if needed. Ditto for WebRTC spec.

In general, the hardware acceleration field is exposing implementation strategies/details, while it is preferable to expose capabilities. As an example, a SW codec might have different efficiency whether ARM-based or x86-based but I do not think we want to expose whether the device is ARM or x86.

The hardware acceleration field is exposing new information that I do not think is available:

  • whether HW codec is available at a given (small) resolution (plus how they behave at these resolutions)
  • whether SW codec is available at a given (high) resolution (plus how they behave at these resolutions)
  • how many HW codec slots might be available. Plus the possibility for pages running on the same device to try locking HW slots as a side channel information.

@chcunningham
Copy link
Collaborator

Do you know what was PING WG assessment of this particular field?

I will defer to @chcunningham for this question.

This field was included in the spec during PINGs review. To my memory, no particular concerns were raised.

@dalecurtis
Copy link
Contributor

Not really, Media capabilities is exposing whether it is ok for battery life to use those settings. UAs can implement heuristics in various ways. Hardware acceleration is an (important) implementation detail that is used for that 'battery-life-friendly' feature.

I agree semantically, but to be clear no UA implemented the heuristic in a way that avoids fingerprinting. I want to highlight that here since despite all UAs caring about fingerprinting, a better solution was not found -- which suggests that we're all following the least-power principal as best we can.

Exposing features is ok, exposing implementation strategies does not look appealing.

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec. Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

Web pages can try to detect at which resolution OSes might switch from SW to HW.
Web pages cannot currently force a HW decoder at a given fixed resolution.

I feel this is another semantic argument that isn't practical. Sure a page can't force a UA to use hardware decoding for an 8K video, but the consequences of a UA not doing so disadvantage the user to the point that no UA is going to do that.

It really depends whether the UA is fine exposing this information or not.
MediaRecorder spec allows to delay events if needed. Ditto for WebRTC spec.

Encode time is only one avenue, the encoded pixels will also vary with implementation details. In addition to varying delay, the UA would also have to sprinkle noise into the source before encoding, which will hurt encoding performance and quality.

In general, the hardware acceleration field is exposing implementation strategies/details, while it is preferable to expose capabilities.

Whether something is an implementation strategy or capability is context dependent. At the level of a codecs API, there's precedent in nearly every API for exposing hardware acceleration as a capability:

Do you have any alternative suggestions on how we can solve the use cases @sandersdan mentions? We're definitely open to alternative mechanisms for solving the problems of 'preferring efficiency' and 'avoid broken/slow hardware/platform codecs'.

The hardware acceleration field is exposing new information that I do not think is available:

  • whether HW codec is available at a given (small) resolution (plus how they behave at these resolutions)
  • whether SW codec is available at a given (high) resolution (plus how they behave at these resolutions)
  • how many HW codec slots might be available. Plus the possibility for pages running on the same device to try locking HW slots as a side channel information.

I don't agree this isn't available, I do agree it would be easier to pin this information down with our proposed API.

@jernoble
Copy link

jernoble commented Jun 3, 2021

@dalecurtis said:

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector. You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

Per PING (IIRC, via @hober), that fingerprinting can occur in a similar way through another API is not itself justification for ignoring the fingerprinting concerns of new API, as it just adds to fingerprinting technical debt.

@jernoble
Copy link

jernoble commented Jun 3, 2021

@dalecurtis said:

Keep in mind that today all browsers expose the hardware decoding value through MediaCapabilities' powerEfficient value. Here's Safari's for VP9: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/graphics/cocoa/VP9UtilitiesCocoa.mm#L256

Chair hat off; implementer hat on

Note that this merely reveals whether the system has a hardware decoder. It can't be used as a side channel to detect, for example, that another tab is already using one of the limited set of hardware decoder slots, nor can it be used to determine how many slots the current system has.

@jernoble
Copy link

jernoble commented Jun 3, 2021

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec.

Forgive my ignorance here, but are UAs free to reject hardwareAcceleration = require in all cases where MediaCapabilities would say powerEfficient: false, and reject hardwareAcceleration = deny in all cases where MediaCapabilities would say powerEfficient: true [edit: and resolve otherwise]? In other words, is there an available fingerprinting mitigation strategy available where UAs would just not expose per-decode information, and instead use coarse-grained system-level capabilities?

@youennf
Copy link
Contributor Author

youennf commented Jun 3, 2021

I don't agree this isn't available, I do agree it would be easier to pin this information down with our proposed API.

Can you describe how this is available?
If you look at powerEfficient in Safari, it is per codec type, not based on resolution for instance.
I do not know how a web app could force Safari to try H264 HW decoding at a low resolution/SW decoding at high resolution, and see whether that fails/how it performs.

Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

That is contradicting a previous statement in this thread:

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't follow, without the field there isn't a way to forcefully fall back. Most applications won't have a WASM fallback.

If pages will have backup codec, a reasonable approach for a web app is to:

  1. Provide as much information as possible to the UA for the UA to pick a codec and configure it as well as possible for the app usecase
  2. Monitor the codec result
  3. If not good enough, fallback to the backup codec

As part of step 1, WebCodec API could provide more knobs/hints to better setup codec: prefer low-latency, prefer battery efficiency, prefer throughput...
I think the low-latency knob for instance is something that might get consensus (see #241 (comment))

@dalecurtis
Copy link
Contributor

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec.

Forgive my ignorance here, but are UAs free to reject hardwareAcceleration = require in all cases where MediaCapabilities would say powerEfficient: false, and reject hardwareAcceleration = deny in all cases where MediaCapabilities would say powerEfficient: true [edit: and resolve otherwise]? In other words, is there an available fingerprinting mitigation strategy available where UAs would just not expose per-decode information, and instead use coarse-grained system-level capabilities?

Yes. The UA has a lot of agency in how it replies here. The best way to think about isConfigSupported() is that it's a strong hint. E.g., practically speaking, isConfigSupported('hw=require') may not be satisfiable by the time configure() is called. As such any mitigations UAs apply to MediaCapabilities are available here as well.

@dalecurtis
Copy link
Contributor

Can you describe how this is available?
If you look at powerEfficient in Safari, it is per codec type, not based on resolution for instance.
I do not know how a web app could force Safari to try H264 HW decoding at a low resolution/SW decoding at high resolution, and see whether that fails/how it performs.

Safari is likely the hardest to force to reveal useful fingerprinting bits here since macOS/iOS are more homogenous platforms than other UAs typically run on. HW decoding at a low resolution may be achievable through a set of crafted container and header lies - possibly not even lies depending on the codec feature set. SW encoding/decoding at a high resolution could be achieved by exhausting the kernel slots for hardware codecs.

Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

That is contradicting a previous statement in this thread:

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't think these are in contradiction, but sorry it's unclear. My statement was specifically about clients which set 'require'. Pages that use 'deny' or 'allow' are unlikely to have a WASM fallback for non-technical reasons.

As part of step 1, WebCodec API could provide more knobs/hints to better setup codec: prefer low-latency, prefer battery efficiency, prefer throughput...
I think the low-latency knob for instance is something that might get consensus (see #241 (comment))

We're all for more knobs, please keep the suggestions coming! Something like requirePowerEfficiency would indeed mostly solve the 'hardwareAcceleration=require` case, but we haven't found a good knob to indicate 'avoid broken/slow hardware/platform codecs' to solve the 'hardwareAcceleration=deny' case. Can you think of one?

@youennf
Copy link
Contributor Author

youennf commented Jun 3, 2021

I would go with a hint like a codecSelectionPreference enum with 'powerEfficiency' and 'maxCompatibility' as possible values.

Implementations would select either the OS codec or their own copy of a SW codec if they have one based on that field.
For mobile UAs, any HW codec used by WebRTC is probably good enough to qualify for maxCompatibility, but it would really be up to UAs.

Pages that use 'deny' or 'allow' are unlikely to have a WASM fallback for non-technical reasons.

I can understand for 'allow'. For 'deny', some OSes might not allow to use a SW H264 encoder at some resolutions (or even provide a SW H264 encoder at any given resolution). It seems applications would need a fallback in that case.

@jernoble
Copy link

jernoble commented Jun 3, 2021

@dalecurtis said:

Yes. The UA has a lot of agency in how it replies here. The best way to think about isConfigSupported() is that it's a strong hint. E.g., practically speaking, isConfigSupported('hw=require') may not be satisfiable by the time configure() is called. As such any mitigations UAs apply to MediaCapabilities are available here as well.

Ok, then at a minimum it would be useful to point out that mitigation in the privacy considerations section of the spec.

Best possible practice would be to normatively declare that hardwareAcceleration=require/requirePowerEfficiency SHOULD reject where MediaCapabilities would return powerEfficient:false, and perhaps hardwareAcceleration=deny/requirePowerInefficiency SHOULD reject where where MediaCapabilities would return powerEfficient:false, as that limits the amount of information exposed to no more than is already available through MediaCapabilities. But both the above mitigation and rate limiting would probably meet Best Practice 7: Enable graceful degradation for privacy-conscious users or implementers.

@jernoble
Copy link

jernoble commented Jun 3, 2021

@chcunningham said:

This is a feature request from several sophisticated apps that bring their own encoders/decoders implemented in WASM (with customized features and tuning). Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

Is this requirement not satisfied by MediaCapabilities? And in parallel, what's the use case for hardwareAcceleration=deny?

@youennf
Copy link
Contributor Author

youennf commented Jun 18, 2021

Let's take a couple of examples, and web page wants to use a VP9 decoder at a given resolution.

  1. UA has a SW VP9 decoder
    MC powerEfficient is a good enough replacement to 'require' since web page knows 'require' would have failed.
    I am still unclear how web page will know the WASM decoder will or will not be better but this is up to the web page to take the decision.

  2. UA has a HW VP9 decoder
    MC powerEfficient is a good enough replacement to 'require' since web page will want to use the UA decoder and will fallback to WASM if creating the decoder fails for whatever reason.

  3. UA has two VP9 decoders available, one is SW and one is HW.
    MC powerEfficient will tell the web page to use UA decoder.
    The web page will ask for a VP9 decoder and UA has to select which one to choose, but there might be an uncertainty there.
    The hint makes sure the UA selects the HW one.
    AIUI, this covers reasonably well @zhlwang use case.

A remaining edge case is SW fallback in case all HW slots are used: I haven't heard people asking to optimise this case.
My understanding is the UA could try to optimise it without web page help by transitioning to HW slot whenever one is available.

Another hypothetical issue is about setup parameters being incompatible with the HW decoder, thus fallbacking to SW. In that case, we should look at which parameters we are talking about and how MC could be enhanced to cover that setup.
That could be useful information to all codec developers: WebRTC, MSE, WebCodecs...

@jbkempf
Copy link

jbkempf commented Jun 18, 2021

Are we saying that preferPowerEfficient = false will effectively select the software decoder depending on UA specific behavior?

This is what I understood.

@dalecurtis
Copy link
Contributor

Thanks @youennf. A few follow up questions:

  • Does IsConfigSupported({powerEfficient: true}) return unsupported in case 1?
  • Does IsConfigSupported({powerEfficient: false}) return unsupported in case 2?
  • What's the difference between what you proposed above and just having the spec say hardwareAcceleration may be treated as a hint by the UA? We could tweak the values to something like: 'noPreference', 'preferSoftware', 'preferHardware' or similar for clarity.

@dalecurtis
Copy link
Contributor

@youennf ping for #239 (comment)?

@youennf
Copy link
Contributor Author

youennf commented Jun 28, 2021

Thanks for the ping.

  • Does IsConfigSupported({powerEfficient: true}) return unsupported in case 1?
  • Does IsConfigSupported({powerEfficient: false}) return unsupported in case 2?

I would say it returns what MediaCapabilities would have returned.
That way, we have consistent behavior and we do not add more fingerprinting than what MC is doing.

  • What's the difference between what you proposed above and just having the spec say hardwareAcceleration may be treated as a hint by the UA? We could tweak the values to something like: 'noPreference', 'preferSoftware', 'preferHardware' or similar for clarity.

powerEfficient piggybacks on MC and is tied to a known user impact (battery drain).
HW/SW distinction is an implementation detail, which is not meaningful for users.
Also, as we can see from this thread, people have different views of what HW or SW codecs behave and their pros and cons, powerEfficient seems less ambiguous.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jun 28, 2021

powerEfficient = false feels pretty ambiguous to me. I.e., today powerEfficient is only an output parameter, your proposal would make it an input one. Why would anyone would ever want to set that based on the name?

I think we need a better reason than just name similarity to use 'powerEfficient' versus something that's more legible and avoids the same fingerprinting issues that you're worried about. Additionally, as Chris is the author of Media Capabilities, we should also give deference to Chris' comments that tying this to powerEfficient seems like a bad idea.

Otherwise, it seems like we may agree on behavior, but naming is still up in the air? I.e., do you agree that my proposal in #239 (comment) is otherwise equivalent to yours from a fingerprinting perspective? I'm happy to bike shed on names if that's where we're at.

Given all the different views, I take the opposite opinion that we should be as precise as possible modulo fingerprinting, since otherwise you're saying that the UA's view is the only one that matters.

@youennf
Copy link
Contributor Author

youennf commented Jun 29, 2021

Otherwise, it seems like we may agree on behavior

Right, my main feedback was to move from a hard option to a hint.

My initial proposal was a hint, something like 'codecSelectionPreference' taking values like 'powerEfficient' (in MC sense/battery life) or 'compatibility' (in the sense that it can be deployed consistently by the UA on its supported platforms, so is most likely software-based). 'prefer' is shorter and seems good to me.

I am not a big fan of direct sw/hw implementation-related values. For instance, I heard the following two opinions in this thread which somehow contradict each other:

Software decoders from the OS/browsers are very often buggy, and less tested than the hw ones.
In the case of actual failures, the cause may be UA/OS bugs, or it may be non-conformant streams. In either case it is likely that a software codec will be more reliable.

@dalecurtis
Copy link
Contributor

Otherwise, it seems like we may agree on behavior

Right, my main feedback was to move from a hard option to a hint.

Great; it's not our ideal outcome, but we can compromise if necessary. Just so we're clear, your proposal grants UA control at the expense of compatibility. The existing hardwareAcceleration: (allow|deny|require) proposal is well defined and easily compatible across UAs.

My initial proposal was a hint, something like 'codecSelectionPreference' taking values like 'powerEfficient' (in MC sense/battery life) or 'compatibility' (in the sense that it can be deployed consistently by the UA on its supported platforms, so is most likely software-based). 'prefer' is shorter and seems good to me.

We still prefer hardwareAcceleration, but in the interest of moving discussion along, a slimmed version of your proposal is:
codecPreference: (preferEfficiency|preferCompatibility|noPreference) with a default of noPreference.

I am not a big fan of direct sw/hw implementation-related values. For instance, I heard the following two opinions in this thread which somehow contradict each other:

Software decoders from the OS/browsers are very often buggy, and less tested than the hw ones.
In the case of actual failures, the cause may be UA/OS bugs, or it may be non-conformant streams. In either case it is likely that a software codec will be more reliable.

I don't think this is a good reason against clearer naming; I'm not even sure these are in contradiction depending on whose software decoder we're talking about. Both statements are likely true lived-experience, but underspecified as general statements. Folks are going to have different outcomes for different use cases; especially when using less common codec features. Hence why we believe it's important to be clear in what the UA is providing.

@marcello3d
Copy link

Where would "prefer quality" fit in to get highest quality compression? (I realize this is somewhat subjective.)

@dalecurtis
Copy link
Contributor

That Q exemplifies the reason we prefer the hardwareAcceleration approach since it's not subjective. :)

Generally speaking, the UA won't know which codec will provide the best quality for a given use case. At a minimum it depends on profile, level, bitrate, and platform. On platforms with a relatively narrow set of hardware (macOS) the UA may have a very good guess, but on less homogenous platforms the UA would struggle to choose here and likely just err towards hardware.

@youennf
Copy link
Contributor Author

youennf commented Jul 5, 2021

Where would "prefer quality" fit in to get highest quality compression? (I realize this is somewhat subjective.)

Hints may be made different for encoder and decoder.
preferCompression is a meaningful hint when compared to preferPowerEfficiency.
It will then be up to the UA to determine which of the codec is expected to give better compression or better power efficiency.

Generally speaking, the UA won't know which codec will provide the best quality for a given use case.

I do not think leaving that choice to web pages will give consistent results across devices.
Also, UA has all information it needs to do that choice based on available encoders and their capabilities.
And we do not want to expose that level of information to web pages for privacy reasons.

@koush
Copy link

koush commented Jul 7, 2021

Hi, WebCodec user here. I use this decoder hint, because Apple's implementation of hardware decoding h264 yuvj420p frames seems to add nearly 1 second of latency on decode. This happens on both my Intel and M1 Macs on WebCodec and Video Toolbox. Current workarounds include:

  • color conversion at the encoder source from yuvj420p to yuv420p before encoding as h264, which is not always possible.
  • hardwareAcceleration: "deny" to force the software path which is low latency (on Chrome/Electron webcodec implementation)

Without this flag, WebCodec would be unusable for me on Mac (unless Apple fixes this?), where I need basically sub 50ms encode->transfer->decode for screen mirroring.

I suspect there are similar other hardware decode implementations that are unconcerned with low latency guarantees.

@youennf
Copy link
Contributor Author

youennf commented Jul 8, 2021

because Apple's implementation of hardware decoding h264 yuvj420p frames seems to add nearly 1 second of latency on decode.

1 second of latency on decoder side seems like a big bug to me.
Have you reported this issue to Apple? If so, can you privately send me the number?

I would guess preferCompatibility hint would work in Chrome as well as the current hardwareAcceleration option.
We talked of a realtime mode as well, which could be helpful there.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 8, 2021

In summary, today we have hardwareAcceleration: allow|deny|require and it is not a hint. Current consensus proposals are:

  • Keep what we have.
  • hardwareAcceleration: noPreference|preferSoftware|preferHardware where values are a hint allowing the UA to override. UAs can optionally reject preferSoftware and preferHardware if no software or hardware accelerator exists respectively.
  • codecPreference: noPreference|preferCompatibility|preferEfficiency with the same practical outcome as the previous item, but perhaps leaving room for future values like preferCompression, preferQuality, etc.

Is that a fair summary @youennf ?

@aboba @padenot @jan-ivar can you weigh in so we can make progress here?

@youennf
Copy link
Contributor Author

youennf commented Jul 8, 2021

Is that a fair summary @youennf ?

That is a fair summary.

@padenot
Copy link
Collaborator

padenot commented Jul 12, 2021

I'm not sure how I would implement option 3, based off the actual names, it's unclear which decoder will be is more compliant or efficient for a particular video on a particular system if the browser is not doing internal benchmarking (and even then, it's not particularly reliable). If it's specced to be the same as option 2, but with different names, I'm leaning towards option 2.

Option 2 is at least quite clear for authors and implementors. It seems like that with MediaCapabilities, we can have a set of API that let authors determine what's best for their use-cases, translating to a better resource utilisation for users.

Rejecting when a particular hardwareAcceleration is not supported is useful at the expense of some fingerprinting (the same as MediaCapabilities, which can be worked around for users that prefer privacy, this is already implemented in Gecko). There are also talks about adding more members to MediaCapabilities, for example related to latency, this will be necessary for authors to pick the decoder that's needed, in particular to determine if it's the hardware or software decoder that can do low-latency.

#239 (comment) (quality) is not handled by this, but maybe it doesn't need to, or we can handle it later.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 19, 2021

@youennf or @aboba any further opinions here? Let's try to resolves as much of this over the issue tracker ahead of the media WG meeting as possible to ensure timeliness. Thanks everyone!

@dalecurtis
Copy link
Contributor

Bump again for @aboba and @youennf, @jernoble as FYI.

During WG meeting we found consensus on making it optional, so now we're now into classic bike shed painting. One new proposal that emerged was something like

  • codecPreference: noPreference | preferSoftware | preferHardware which has a bit of the best of all worlds in that the current proposed values are clear but could be extended in the future with things like preferQuality etc.

We'll try to reach an editors consensus during the editors meeting tomorrow and report back here.

@jbkempf
Copy link

jbkempf commented Jul 28, 2021

During WG meeting we found consensus on making it optional, so now we're now into classic bike shed painting. One new proposal that emerged was something like

* `codecPreference: noPreference | preferSoftware | preferHardware` which has a bit of the best of all worlds in that the current proposed values are clear but could be extended in the future with things like `preferQuality` etc.

preferHardware could mean fallback to software, though?

@padenot
Copy link
Collaborator

padenot commented Jul 28, 2021

It could be interesting to have a few scenarios when a UA would fall back to software instead of rejecting. It's certainly possible since it's a hint though.

I can think of falling back silently to software for a codec where hardware decoding is super common like h264, for a CPU that doesn't have it (say the i9 7940x that I have here, that is not too common and that is more than capable of decoding any video rapidly), to avoid revealing that it doesn't, maybe, when the UA is set to be extra privacy-preserving (say privacy.resisteFingerprinting set to true in Firefox).

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 28, 2021

Editor's call, we've landed on consensus for:
hardwareAcceleration: no-preference|prefer-software|prefer-hardware as a hint. I'll send a PR for this soon.

dalecurtis added a commit that referenced this issue Jul 28, 2021
* It's now a hint instead of being required.
* Values are noPreference, preferSoftware, and preferHardware.

Fixes: #239
@youennf
Copy link
Contributor Author

youennf commented Jul 29, 2021

A few thoughts:

  • Going with a hint instead of a mandatory field fixes my main fingerprinting concern
  • I am not enthusiastic about 'no-preference'. Can we get consensus on a reasonable default value? One main usecase of WebCodecs is being able to get HW acceleration so I would assume prefer-hardware (or prefer-powerEfficient) could be a good default. And we could remove no-preference.
  • Exposing SW/HW criteria does not seem like a great idea in general. Yes, it does make UA developers life easy but at the expense of web developers. As an example, if MC says a config is powerEfficient, what happens if web developer sets prefer-software? Choosing the optimal configuration for power efficiency usually requires all sorts of information sources, some of which the web developer does not have access to (for good reasons).

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 29, 2021

@youennf wrote:

  • I am not enthusiastic about 'no-preference'. Can we get consensus on a reasonable default value? One main usecase of WebCodecs is being able to get HW acceleration so I would assume prefer-hardware (or prefer-powerEfficient) could be a good default. And we could remove no-preference.

prefer-hardware isn't correct as a default though. UAs already choose software versus hardware based on resolution today. In the context of the property name being hardwareAcceleration I think no-preference makes sense.

I'm not against renaming no-preference to prefer-efficiency if @padenot @aboba feel that's a better value. My goal was to avoid limiting UA choice by defining what no-preference means though. I.e., what if a UA decides for privacy reasons it will only allow a site access to the software codecs?

  • Exposing SW/HW criteria does not seem like a great idea in general. Yes, it does make UA developers life easy but at the expense of web developers. As an example, if MC says a config is powerEfficient, what happens if web developer sets prefer-software? Choosing the optimal configuration for power efficiency usually requires all sorts of information sources, some of which the web developer does not have access to (for good reasons).

Per discussions and the proposed spec text that's up to the UA to decide. The current proposal says UAs SHOULD try to respect the developers choice, but may ignore the value for any reason. Sample reasons include privacy or UA limitations. Please leave a review on the proposed PR if there's text you'd like to include.

@youennf
Copy link
Contributor Author

youennf commented Jul 30, 2021

I'm not against renaming no-preference to prefer-efficiency

Overall prefer-efficiency seems a good default to me.
prefer-powerEfficiency might be a more accurate name (also we might introduce later on a prefer-compressionEfficiency on encoder side?)
It is also something easy to understand and it is now reasonably understood how to implement it.
prefer-hardware seems somehow redundant with it but I can live with that.

what if a UA decides for privacy reasons it will only allow a site access to the software codecs?

In that case, the UA would prune all HW codecs from the list of available codecs.
The codec selection algorithm would remain the same except it will run on the reduced set of codecs, only containing SW codecs.

@dalecurtis
Copy link
Contributor

My point was prefer-efficiency is a misnomer in the privacy case, but again I only have a light objection there. If @padenot or @aboba are swayed we can rename, otherwise we'll stick with no-preference.

@chrisn chrisn removed the agenda Add to Media WG call agenda label Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior). editorial changes to wording, grammar, etc that don't modify the intended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.