-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is exposing https://w3c.github.io/webcodecs/#enumdef-hardwareacceleration a good idea #239
Comments
This is a feature request from several sophisticated apps that bring their own encoders/decoders implemented in WASM (with customized features and tuning). Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec
What part is unclear?
Fingerprinting and codec features are often add odds. In such cases, including this one, we strive to offer the features folks are demanding without offering any extra information (least power principle). This is why we have reduced the signal to "hardware accelerated", as opposed to exposing the name of the hardware or similar.
Agree, I think that's a good thing for us to call out. |
Triage note: marking 'breaking', as removal of this attribute would clearly break behavior for folks that have come to rely on it. This is purely a triage note; I am opposed to actually implementing this break, as the feature is directly requested by users with reasons cited above. Also marking 'editorial' for the requested additions to privacy considerations. |
@youennf, pls see above. If nothing more to discuss I'd like to close. |
Thanks for pinging me.
These applications can probably use whether encoding/decoding is powerEfficient through media capabilities for their configuration as a good enough approximation. Also, this usecase is about requiring hardware, while the API is allowing to require software. Is there a usecase for that other part? As a side note, this potentially forbids some OS strategies like switching between software/hardware depending on various factors (other apps using HW, battery status...). Or it could force OSes to lie to web applications. If we really want such an API, I would go with MediaCapabilities to let the application decide whether it wants OS codecs or its own codec. If the web application wants OS codecs, a hint API instead of a must-use API seems more appropriate since it would not cause fingerprinting. |
There are several things that may be assumed about a hardware codec (none of which are guaranteed by WebCodecs):
My understanding of the use case that Chris outlined above (a media player application) is that the goal is to take the efficiency (first three points above) when it's available, but to fully control the fallback software path for consistent behavior.
Yes, it's also common for applications to use hardware optimistically, but to prefer software if it is determined that hardware does not meed the application's requirements (last two points above). This has been historically difficult for UAs to determine, so much so that there have been proposals to allow WebRTC servers to request disabling of hardware acceleration on a per-connection basis.
This was one of the first requests ever made by a partner for WebCodecs. It's probably not the most important WebCodecs feature, but I don't consider it trivial either.
This is true. Applications should not be setting a value for this property if they don't want to restrict the implementation. |
But then, how is the web page knowing that a hardware encoder is good enough/not good enough in terms of compat? |
Typically by trying hardware first, measuring performance, and monitoring for any errors with their particular content and requirements. The key thing is to be able to act on that information once they have gathered it. |
This strategy can be done without using the hardware acceleration field, just try what the OS provides. It seems this is only useful in the case the application wants to do the following:
|
I don't follow, without the field there isn't a way to forcefully fall back. Most applications won't have a WASM fallback.
I think many WebRTC-style applications would choose to switch to a reduced quality/feature mode if the WebCodecs codecs were deemed inadequate and there was no alternative available. |
To summarise, the main usecase of this property is for applications to force SW code path. Can you clarify the use case of such applications, in particular those applications that would do monitoring but would not have a fallback? Again, given this is a potential new fingerprinting vector, the bar should be high. |
I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector. You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video. |
I push back lightly on this characterization, while
Sure. Some things applications may be monitoring include:
These may be things that are inherent to the platform codecs or they may be things that vary depending on system load. WebRTC-style applications are likely to use resolution, bitrate, codec, and profile settings as a first line of defense. In cases where that is inadequate (eg. because jitter is just too high at any setting), forcing software codecs can be a reliable workaround. In the case of actual failures, the cause may be UA/OS bugs, or it may be non-conformant streams. In either case it is likely that a software codec will be more reliable.
I will defer to @chcunningham for this question. |
I think we all agree we want to go to a world where we mitigate-then-remove those issues.
How do you force a video element to use either the SW or the HW decoder at a given fixed resolution? |
For those things, I fail to understand the relationship with the HW acceleration field. FWIW, I know OSes that have more than one SW encoder of a given codec. A single boolean is not sufficient to enumerate them all. |
It doesn't, nor could it reliably do so. It can guess at a subset, but even for those they vary by system load, configuration, and content.
This is potentially possible but is delving into trying to guess what applications want. For example Chrome already avoids hardware decode for WebRTC on Windows 7 due to high latency, but we can't really know every application's detailed preferences well enough to implement a generic selection algorithm. WebCodecs also operates at a low-enough level that things like dynamic codec switching are unlikely to 100% reliable, so the application will need to be involved in the algorithm in some direct way.
This is true. We didn't see much advantage with full enumeration, and the fingerprinting concerns are much larger with an API like that. |
Agreed, but AFAIK, the only mitigation possible is restricting when a hardware codec is used. E.g., requiring N frames before a hardware codec kicks in and/or limiting hardware codec usage to high-trust modes. You could apply both such restrictions to the proposed property. E.g., always return false unless trust requirements are satisfied. Keep in mind that today all browsers expose the hardware decoding value through MediaCapabilities' powerEfficient value. Here's Safari's for VP9: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/graphics/cocoa/VP9UtilitiesCocoa.mm#L256
AFAIK most browsers use a simple resolution filter (see above), so it's a matter of finding the cut-offs used by each browser.
MediaRecorder's total encode time will expose a hardware encoder versus a software encoder entirely on the client. A more sophisticated client can use a WebRTC loopback or server setup to figure this out similarly. |
Not really, Media capabilities is exposing whether it is ok for battery life to use those settings. UAs can implement heuristics in various ways. Hardware acceleration is an (important) implementation detail that is used for that 'battery-life-friendly' feature. Exposing features is ok, exposing implementation strategies does not look appealing.
Web pages can try to detect at which resolution OSes might switch from SW to HW.
It really depends whether the UA is fine exposing this information or not. In general, the hardware acceleration field is exposing implementation strategies/details, while it is preferable to expose capabilities. As an example, a SW codec might have different efficiency whether ARM-based or x86-based but I do not think we want to expose whether the device is ARM or x86. The hardware acceleration field is exposing new information that I do not think is available:
|
This field was included in the spec during PINGs review. To my memory, no particular concerns were raised. |
I agree semantically, but to be clear no UA implemented the heuristic in a way that avoids fingerprinting. I want to highlight that here since despite all UAs caring about fingerprinting, a better solution was not found -- which suggests that we're all following the least-power principal as best we can.
There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all
I feel this is another semantic argument that isn't practical. Sure a page can't force a UA to use hardware decoding for an 8K video, but the consequences of a UA not doing so disadvantage the user to the point that no UA is going to do that.
Encode time is only one avenue, the encoded pixels will also vary with implementation details. In addition to varying delay, the UA would also have to sprinkle noise into the source before encoding, which will hurt encoding performance and quality.
Whether something is an implementation strategy or capability is context dependent. At the level of a codecs API, there's precedent in nearly every API for exposing hardware acceleration as a capability:
Do you have any alternative suggestions on how we can solve the use cases @sandersdan mentions? We're definitely open to alternative mechanisms for solving the problems of 'preferring efficiency' and 'avoid broken/slow hardware/platform codecs'.
I don't agree this isn't available, I do agree it would be easier to pin this information down with our proposed API. |
@dalecurtis said:
Per PING (IIRC, via @hober), that fingerprinting can occur in a similar way through another API is not itself justification for ignoring the fingerprinting concerns of new API, as it just adds to fingerprinting technical debt. |
@dalecurtis said:
Chair hat off; implementer hat on Note that this merely reveals whether the system has a hardware decoder. It can't be used as a side channel to detect, for example, that another tab is already using one of the limited set of hardware decoder slots, nor can it be used to determine how many slots the current system has. |
Forgive my ignorance here, but are UAs free to reject |
Can you describe how this is available?
That is contradicting a previous statement in this thread:
If pages will have backup codec, a reasonable approach for a web app is to:
As part of step 1, WebCodec API could provide more knobs/hints to better setup codec: prefer low-latency, prefer battery efficiency, prefer throughput... |
Yes. The UA has a lot of agency in how it replies here. The best way to think about isConfigSupported() is that it's a strong hint. E.g., practically speaking, isConfigSupported('hw=require') may not be satisfiable by the time configure() is called. As such any mitigations UAs apply to MediaCapabilities are available here as well. |
Safari is likely the hardest to force to reveal useful fingerprinting bits here since macOS/iOS are more homogenous platforms than other UAs typically run on. HW decoding at a low resolution may be achievable through a set of crafted container and header lies - possibly not even lies depending on the codec feature set. SW encoding/decoding at a high resolution could be achieved by exhausting the kernel slots for hardware codecs.
I don't think these are in contradiction, but sorry it's unclear. My statement was specifically about clients which set 'require'. Pages that use 'deny' or 'allow' are unlikely to have a WASM fallback for non-technical reasons.
We're all for more knobs, please keep the suggestions coming! Something like |
I would go with a hint like a codecSelectionPreference enum with 'powerEfficiency' and 'maxCompatibility' as possible values. Implementations would select either the OS codec or their own copy of a SW codec if they have one based on that field.
I can understand for 'allow'. For 'deny', some OSes might not allow to use a SW H264 encoder at some resolutions (or even provide a SW H264 encoder at any given resolution). It seems applications would need a fallback in that case. |
@dalecurtis said:
Ok, then at a minimum it would be useful to point out that mitigation in the privacy considerations section of the spec. Best possible practice would be to normatively declare that |
@chcunningham said:
Is this requirement not satisfied by MediaCapabilities? And in parallel, what's the use case for |
Let's take a couple of examples, and web page wants to use a VP9 decoder at a given resolution.
A remaining edge case is SW fallback in case all HW slots are used: I haven't heard people asking to optimise this case. Another hypothetical issue is about setup parameters being incompatible with the HW decoder, thus fallbacking to SW. In that case, we should look at which parameters we are talking about and how MC could be enhanced to cover that setup. |
This is what I understood. |
Thanks @youennf. A few follow up questions:
|
@youennf ping for #239 (comment)? |
Thanks for the ping.
I would say it returns what MediaCapabilities would have returned.
powerEfficient piggybacks on MC and is tied to a known user impact (battery drain). |
I think we need a better reason than just name similarity to use 'powerEfficient' versus something that's more legible and avoids the same fingerprinting issues that you're worried about. Additionally, as Chris is the author of Media Capabilities, we should also give deference to Chris' comments that tying this to Otherwise, it seems like we may agree on behavior, but naming is still up in the air? I.e., do you agree that my proposal in #239 (comment) is otherwise equivalent to yours from a fingerprinting perspective? I'm happy to bike shed on names if that's where we're at. Given all the different views, I take the opposite opinion that we should be as precise as possible modulo fingerprinting, since otherwise you're saying that the UA's view is the only one that matters. |
Right, my main feedback was to move from a hard option to a hint. My initial proposal was a hint, something like 'codecSelectionPreference' taking values like 'powerEfficient' (in MC sense/battery life) or 'compatibility' (in the sense that it can be deployed consistently by the UA on its supported platforms, so is most likely software-based). 'prefer' is shorter and seems good to me. I am not a big fan of direct sw/hw implementation-related values. For instance, I heard the following two opinions in this thread which somehow contradict each other:
|
Great; it's not our ideal outcome, but we can compromise if necessary. Just so we're clear, your proposal grants UA control at the expense of compatibility. The existing
We still prefer
I don't think this is a good reason against clearer naming; I'm not even sure these are in contradiction depending on whose software decoder we're talking about. Both statements are likely true lived-experience, but underspecified as general statements. Folks are going to have different outcomes for different use cases; especially when using less common codec features. Hence why we believe it's important to be clear in what the UA is providing. |
Where would "prefer quality" fit in to get highest quality compression? (I realize this is somewhat subjective.) |
That Q exemplifies the reason we prefer the Generally speaking, the UA won't know which codec will provide the best quality for a given use case. At a minimum it depends on profile, level, bitrate, and platform. On platforms with a relatively narrow set of hardware (macOS) the UA may have a very good guess, but on less homogenous platforms the UA would struggle to choose here and likely just err towards hardware. |
Hints may be made different for encoder and decoder.
I do not think leaving that choice to web pages will give consistent results across devices. |
Hi, WebCodec user here. I use this decoder hint, because Apple's implementation of hardware decoding h264 yuvj420p frames seems to add nearly 1 second of latency on decode. This happens on both my Intel and M1 Macs on WebCodec and Video Toolbox. Current workarounds include:
Without this flag, WebCodec would be unusable for me on Mac (unless Apple fixes this?), where I need basically sub 50ms encode->transfer->decode for screen mirroring. I suspect there are similar other hardware decode implementations that are unconcerned with low latency guarantees. |
1 second of latency on decoder side seems like a big bug to me. I would guess preferCompatibility hint would work in Chrome as well as the current hardwareAcceleration option. |
In summary, today we have
Is that a fair summary @youennf ? @aboba @padenot @jan-ivar can you weigh in so we can make progress here? |
That is a fair summary. |
I'm not sure how I would implement option 3, based off the actual names, it's unclear which decoder will be is more compliant or efficient for a particular video on a particular system if the browser is not doing internal benchmarking (and even then, it's not particularly reliable). If it's specced to be the same as option 2, but with different names, I'm leaning towards option 2. Option 2 is at least quite clear for authors and implementors. It seems like that with Rejecting when a particular #239 (comment) (quality) is not handled by this, but maybe it doesn't need to, or we can handle it later. |
Bump again for @aboba and @youennf, @jernoble as FYI. During WG meeting we found consensus on making it optional, so now we're now into classic bike shed painting. One new proposal that emerged was something like
We'll try to reach an editors consensus during the editors meeting tomorrow and report back here. |
preferHardware could mean fallback to software, though? |
It could be interesting to have a few scenarios when a UA would fall back to software instead of rejecting. It's certainly possible since it's a hint though. I can think of falling back silently to software for a codec where hardware decoding is super common like h264, for a CPU that doesn't have it (say the i9 7940x that I have here, that is not too common and that is more than capable of decoding any video rapidly), to avoid revealing that it doesn't, maybe, when the UA is set to be extra privacy-preserving (say |
Editor's call, we've landed on consensus for: |
* It's now a hint instead of being required. * Values are noPreference, preferSoftware, and preferHardware. Fixes: #239
A few thoughts:
|
@youennf wrote:
I'm not against renaming
Per discussions and the proposed spec text that's up to the UA to decide. The current proposal says UAs SHOULD try to respect the developers choice, but may ignore the value for any reason. Sample reasons include privacy or UA limitations. Please leave a review on the proposed PR if there's text you'd like to include. |
Overall prefer-efficiency seems a good default to me.
In that case, the UA would prune all HW codecs from the list of available codecs. |
I am wondering what HardwareAcceleration is supposed to be used for.
One potential use would be to always prefer power efficiency. But power efficiency does not mandate hardware acceleration.
Depending on the device, the codec, the resolution, software based codecs might be better suited. It is unclear how a web developer will be able to select hardwareAcceleration for that case except to let the UA decide with 'allow'.
Another possibility is to maximise compatibility and use 'deny'. In that case though, it means that web developer looses power efficiency in a lot of cases. A careful web developer will then probably want to enter the business of identifying which SW and HW codecs are in use on a device. This does not seem great and somehow contradicts the desire to not increase fingerprinting.
It seems UA is in general the best entity to decide what to use at any given point.
Instead of hard requirements, a web application could look at providing hints, though it is true hints tend to be difficult to define and implement consistently.
It also seems HardwareAcceleration is a potential fingerprinting vector though it is not marked as so in the spec.
The text was updated successfully, but these errors were encountered: