Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RTCInboundRtpStreamStats:decoderFallback #725

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions webrtc-stats.html
Original file line number Diff line number Diff line change
Expand Up @@ -1032,6 +1032,7 @@ <h3>
boolean powerEfficientDecoder;
unsigned long framesAssembledFromMultiplePackets;
double totalAssemblyTime;
boolean decoderFallback;
};</pre>
<section>
<h2>
Expand Down Expand Up @@ -1712,6 +1713,16 @@ <h2>
(if any). Only incremented for frames consisting of more than one RTP packet.
</p>
</dd>
<dt>
<dfn>decoderFallback</dfn> of type <span class=
"idlMemberType">boolean</span>
</dt>
<dd>
<p>
Only [= map/exist =]s for video. It indicates that the current WebRTC session is fallbacked
to the Software Decoder. When the session has started with Hardware Decoder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this also exposes HW information, I wonder if we need to use the same gate as for powerEfficientDecoder, or if this is limited enough in scope that it is okay not to...

                <p class="fingerprint">
                  Only defined when [= exposing hardware is allowed =].
                </p>

Is the idea that you would use MediaCapabilities to query HW support and then use decoderFallback to detect when HW is not achieved anymore?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henbos thank you for the review. That's right. We would like to detect decoderFallback when hardware decoder get fallback. I believe this does not need the fingerprint gate because it does not expose the hardware information explicitly like other metrics like powerEfficiency and decoderImplementation since the decoder fallback information might be dynamic information based on the runtime status of the decoder even the browser is capable of hardware decoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see a decoderFallback being true in the stats that tells you that there was HW at some point. You'll need to trigger this fallback somehow which seems tricky but possible e.g. with insertable streams.

We really need to answer the question how use-cases like gamestreaming which we are supposed to support can get access to this.

Copy link
Author

@xingri xingri Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fippo & @henbos Thank you for reviewing the changes. I agree that new metric could also expose the HW capability at some instance of the time.

However I would like to note that what the fingerprint guidance concerns is not only the hardware exposer of the user agent(browser) but risk of the fingerprinting of the user agent as defined by https://w3c.github.io/fingerprinting-guidance/ as follows:

..., browser fingerprinting is the capability of a site to identify or re-identify a visiting user, user agent or device via configuration settings or other observable characteristics.

To share the transitions of this metric, it may look like following:

  • In case of no HW Decoder: stays false
  • In case of HW Decoder: starts with false and turns to true only when the decoder gets fallback and changes to false again when the decoder gets recovered to HW.

So the current metric could not be used for identifying the user, user agent or device because of ephemeral characteristics of decoder fallback.

And when we consider the cloud gaming's general use case, the camera and microphone permission are not required because it works as one-way streaming from the gaming server to the browser client, so it does not require user permission for the hardware access.

And we could detect any streaming abnormalities through the decoder fallback from decoder implementation metric and it is currently blocked by the fingerprint concern by #712.

That's why I proposed this new metric which might not expose the finger print of the user agent directly unlike decoder implementation and power efficiency metrics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fippo @henbos @xingri - Adding to this conversation from the perspective of Xbox Cloud Gaming. Right now, we care about decoderImplementation to be able to detect SW decode fallback cases, so that we can trigger resolution downgrade logic and/or identify regressions in devices that we know should support HW decode. For instance, we've hit instances where a not-as-important "splash screen" asset using the same codec as our game stream would sometimes prevent the actual game stream from being able to acquire the HW decoder, and the diagnosis/monitoring of that problem was only possible thanks to the decoderImplementation info from getStats.

As we all know, the decoderImplementation info from WebRTC getStats() will now require getUserMedia() permissions, which is a no-go for most cloud gaming scenarios, as @xingri has already pointed out. Xbox Cloud Gaming only asks for getUserMedia() in the Voice Chat scenario. We cannot ask for getUserMedia() permissions in all game streams because it's invasive (prompts users), overkill (gives us access to the device camera/mic hardware when we don't always need it), and has a high chance of failing (users have no reason to allow it unless they want to use Voice Chat). This proposed decoderFallback property satisfies our functional requirements of being able to reliably detect SW vs HW decode, while no longer exposing detailed device-specific information that could be categorized as fingerprintable.

We understand that MediaCapabilities APIs allow us to query the HW/SW capabilities of the client device, but there's no guarantee that those capabilities will take effect during the actual game stream. Xbox Cloud Gaming and similar scenarios care more about what actually happened during the stream (was HW decode actually used?) than what may happen during the stream (do we have a chance that HW decode will be used?), since we need that data point to make mid-stream decisions. This is why having decoderFallback be part of the RTCInboundRtpStreamStats makes sense to us. Without this, our best bet is to monitor decode times (via requestVideoFrameCallback) and have heuristics around what SW and HW decode should look like for a particular device, which is error-prone and is arguably even more of a fingerprinting behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current metric could not be used for identifying the user, user agent or device because of ephemeral characteristics of decoder fallback.

There might be a way to consistently trigger fallback, but agree this is a much smaller concern than the "always expose HW" problem.

We cannot ask for getUserMedia() permissions

Agreed. As the spec is currently written I'd say we are not solving your use case.

We understand that MediaCapabilities APIs allow us to query the HW/SW capabilities of the client device, but there's no guarantee that those capabilities will take effect during the actual game stream.

The privacy inconsistency where one spec exposes HW and another spec doesn't is frustrating. I don't have a good answer to that, which is why I think we should file a separate issue to discuss the HW implications and solve them in a standalone issue. But until we have solved those, I think the conservative thing to do is to add this metric with the same HW check as the other HW metrics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not trying to shut down the privacy discussion, just trying to move it to a standalone issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #730 to follow up on the discussion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henbos. Thank you for the feedback and consideration of the issue.
As the chrome 110 will be released tomorrow, the cloud game services will be officially impacted by the hardware exposure check on decoderImplementation metric updated by the recent commit of #712.

Also, if the new metric will have the same limitation as decoderImplementation, it will not make any improvements on the current situation based on the difficulties of getUserMedia() permissions as @Diego-Perez-Botero and I described.

So I am wondering whether we have a timeline for the discussion of hardware exposure checking #730?
If we could get the decision in a timely manner, that would be really appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in #730 and PR #732

</p>
</dd>
</dl>
</section>
</div>
Expand Down