Stats API should require additional permission / user opt-in #550

pes10k · 2020-02-18T20:49:09Z

The stats collected by this API enable two new privacy harms / risks. This spec should enable the main uses of WebRTC, without automatically exposing these additional risks.

a) Leaking communication / plain text

Prior work (e.g. http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf) has shown that you can recreate the plain text content of an encrypted, dTLS encoded audio conversation, based on patterns in packet size, frequency, etc. The fine level network information exposed by this API seems to be sufficient to re-carry out this attack. If this is needed for analysis / quality control / etc use, the API should limit it to these special cases (additional permission, for example).

b) Hardware fingerprinting

decoderImplementation, the codec data point, etc reveal information about the underlying hardware beyond what's identified by getUserMedia

The text was updated successfully, but these errors were encountered:

youennf · 2020-02-19T03:11:18Z

a) Leaking communication / plain text

This seems like useful consideration for isolated streams.
For regular streams, the web page has access to the audio/text content so this should be fine.

b) Hardware fingerprinting

Depending on the actual implementation by the browser, this may or may not be an issue.
Agreed guidelines would be useful here.

pes10k · 2020-02-19T03:19:20Z

Hi @youennf. Thank you for the reply. Again though, these privacy leaks need to be addressed in the functionality of the standard; its not sufficient to list them in the concerns section.

You all are the experts on this functionality; if you can't figure out how to design and implement it in a privacy preserving way, no one can ;) (and just as seriously, punting to "implementors will fix" means that either there will be divergent implementations, or, for web compat reasons, everything will get pulled to the least private, most permissive implementation).

fippo · 2020-02-19T06:08:46Z

The fine level network information exposed by this API seems to be sufficient to re-carry out this attack.

I think that is a bit too general. Lets ignore for a bit that CBR is the answer to this particular attack. Lets also ignore that you have the audio stream.

The key of fon-iks is this: The size of the encryptedpacket therefore reflects properties of the input signal
getStats provides packetsSent and bytesSent. With opus we're talking about a typical frame size of 20ms or 50 packets per second.
To carry ouf foniks you would need to call getStats with a resolution higher than that.

Lets try this actually. Go to one of the samples and paste the following:

const bytes = [];
let iv = setInterval(async () => {
  const sender = pc1.getSenders()[0];
  const stats = await sender.getStats();
  stats.forEach(s => {
    if (s.type === 'outbound-rtp') {
      bytes.push([s.packetsSent, s.bytesSent]);
    }
  });
}, 10);
setTimeout(() => clearInterval(iv), 2000);

If you do a bytes.map(x => x[0]) you can see that in Chrome there you don't even have enough granularity to capture a single packet. In Firefox you do. @henbos can probably comment on getStats caching in Chrome.

I didn't see any discussion in the foniks paper about the frame size/duration but I assume that accuracy (recall/precision) drops if you increase the frame size. The mitigation here might be to limit the resolution of getStats.

Note that this concern probably also applies to getSynchronizationSources which exposes the RTP timestamp and the audioLevel (typically from the ssrc-audio-level extension) and is explicitly designed for high-frequency polling.

henbos · 2020-02-19T08:44:51Z

What granularity is needed to be able to tell anything more useful than "there is or isn't audio being produced right now"? getStats() gives you aggregate counters, so the best you can do is to say that in an interval between two getStats() calls your average packet size was X bytes and the average audio energy was Y. In Chrome, the minimum interval you could achieve is 50 ms due to caching. Would mandating a caching time mitigate the problem?
Unless we're talking about isolated streams, the RTCPeerConnection can only process tracks you already have access to. You can use WebAudio or you can read pixels of a canvas or other APIs, not to mention you're sending the tracks somewhere, so the other endpoint can do whatever it wants (including communicating back with the JS to tell it whatever the result of its analysis). So we there's already, directly or indirectly, access on a byte level.

Re: @fippo: getSynchronizationSources() can poll much more frequently. It will tell you audioLevel of packets (only received packets but you could do a loopback if you could indirectly say something about sent packets too). But again, why not use WebAudio?

In either case, I don't mean to make the argument "there is another API that is even worse" as an excuse for us to do something bad. My question is: Is this really a problem with these APIs or is this an objection to having granted access to tracks in the first place, which is usable with a large number of APIs?

Codec capabilities and encoder/decoder implementation strings are a valid fingerprinting concern.

What would be a way to mitigate these concerns? Adding a prompt on a per-API basis fails to address how confusing a "do you want to grant access to getStats?" would be to a normal user. Would hardware and media related privacy concerns be best addressed with a prompt of larger scope?

youennf · 2020-02-19T16:46:44Z

these privacy leaks need to be addressed in the functionality of the standard

We should first check whether, in our current model, these are leaks.
For audio/video/data, WebRTC assumes pages have access to the content so I do not consider them as leaks.
Isolated streams is a proposal that tries to change this model. With that proposal, we should indeed consider whether stats are leaking and I believe audioLevel does indeed leak information.

In general, stats do not seem absolutely necessary for what the user intends to do.
As such, I would like them to be privacy neutral and we should probably require that.
With regards to decoderImplementation, I think it can be implemented in such a way that it will not provide any more fingerprinting information than say the user agent string, but might still provide a more easy way to get that information. Should we add a requirement along those lines?

I am not a big fan of gating stats on getUserMedia.
Some websites provide a button to report a problem. In that workflow, I could see how a prompt to gather more information might be feasible. It doesn't seem to meet the bar so far though.

pes10k · 2020-02-19T18:38:35Z

I just wanted to thank you all for tackling this seriously, even if it seems like solutions are still being worked out. I'm happy to phase out for a little bit while you all work out a solution for getting these issues addressed, to avoid adding noise, but would also be glad to be involved if there is anything I can do to help. Please just let me know how i can be most helpful

alvestrand · 2020-02-26T15:29:06Z

For the hardware fingerprinting issue, it seems like this should be part of an overarching issue of "is the page permitted to know what hardware the user is running", and gated on a permission that isn't WebRTC-specific. This touches on UA strings, GPU API, performance API and probably many others.

henbos · 2020-02-26T15:33:11Z

Action item on me to split this up into two issues and follow up on a) and b) separately

pes10k · 2020-02-27T00:52:29Z

@alvestrand I would welcome some proposal / spec for that, and would be happy to help push it along, but (i) you probably don't want to gate the progress of this spec on that hypo-ethical permission / spec, and (ii) its still important to be as narrow as possible in most cases. A global "fingerprinting end points on" switch forces users into a no-win situation; I expect a minimal capabilities model will be better in almost all cases

youennf · 2021-05-05T07:03:41Z

@henbos, are you still working on these issues?
It seems like decoderImplementation/encoderImplementation are the last remaining stats that have fingerprinting consequences and it would be good to have a resolution there.

henbos · 2021-05-05T07:34:22Z

I'm not working on this, sadly. Unassigning myself to reflect that.

henbos · 2022-09-12T19:18:10Z

We'd still like to expose power efficiency (#666) but blocked on this issue. I don't know how to move forward though, a user prompt seems too aggressive. What do we do in MediaCapabilities?

alvestrand · 2022-09-12T20:18:52Z

The fingerprinting mitigations outlined for MediaCapabilities is here: https://www.w3.org/TR/media-capabilities/#decoding-encoding-fingerprinting

Rate limiting isn't really possible; we can't tell a getStats that looks at this item from a getStats that doesn't.

youennf · 2022-09-12T22:10:34Z

@henbos proposed during the meeting the possibility to only expose this kind of fingerprinting past some user validation (for instance if getUserMedia/getDisplayMedia was called successfully on the document).

Maybe there is a way to phrase it in a generic way, something like:
As this field is a fingerprinting vector, it MUST only be exposed to contexts that the user interacted with in a deep manner, for instance if https://w3c.github.io/mediacapture-main/#context-capturing-state returns true.

henbos · 2022-09-13T07:49:31Z

Because a) and b) (from issue description) are a little different and likely require different mitigations (e.g. a mitigation to "leaking communication / plain text" is related to granularity of packet counters etc whereas "hardware fingerprinting" is about which context we should be allowed to expose HW states) I split this issue up into different issues.

This issue can continue to be about "leaking communication / plain text"
For HW fingerprinting I filed #675 and, because codec is exposed in multiple places, a separate issue for that so that we can sync with webrtc-pc: #674.

As this field is a fingerprinting vector, it MUST only be exposed to contexts that the user interacted with in a deep manner, for instance if https://w3c.github.io/mediacapture-main/#context-capturing-state returns true.

I like this idea, let's follow up in #675

henbos · 2022-09-27T14:00:45Z

Since this issue was split up into a bunch of different sub-issues, I figured it would make more sense to replace this by a stand-alone issue #699 to make it more concise. Referenced this issue for context, but I'm closing it in favor of that one.

pes10k mentioned this issue Feb 18, 2020

Stats API should require additional permission / user opt-in (w3c/webrtc-stats#550) w3cping/tracking-issues#42

Open

plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 19, 2020

henbos self-assigned this Feb 26, 2020

henbos removed their assignment May 5, 2021

youennf mentioned this issue May 11, 2021

Fingerprinting section could be improved w3c/webcodecs#238

Open

jan-ivar mentioned this issue Jun 15, 2021

Is exposing https://w3c.github.io/webcodecs/#enumdef-hardwareacceleration a good idea w3c/webcodecs#239

Closed

youennf mentioned this issue Sep 11, 2022

powerEfficientEncoder/powerEfficientDecoder #666

Closed

uazo added this to @uazo's privacy notes Sep 13, 2022

This was referenced Sep 13, 2022

Codec stats reveal hardware information which could be used for fingerprinting #674

Open

The stats API allow hardware fingerprinting (encoder, powerEfficient) #675

Closed

youennf mentioned this issue Sep 21, 2022

Add powerEfficient[En/De]coder (#666) and fingerprint mitigations (#675). #670

Merged

henbos mentioned this issue Sep 27, 2022

Privacy concern: Leaking communication / plain text using patterns in packet size, frequency, etc. #699

Closed

henbos closed this as completed Sep 27, 2022

henbos moved this to Done in @uazo's privacy notes Sep 27, 2022

alvestrand mentioned this issue Oct 6, 2022

Isolated tracks may need stats API to hide some data w3c/webrtc-identity#39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats API should require additional permission / user opt-in #550

Stats API should require additional permission / user opt-in #550

pes10k commented Feb 18, 2020

youennf commented Feb 19, 2020

pes10k commented Feb 19, 2020

fippo commented Feb 19, 2020 •

edited

Loading

henbos commented Feb 19, 2020

youennf commented Feb 19, 2020

pes10k commented Feb 19, 2020

alvestrand commented Feb 26, 2020

henbos commented Feb 26, 2020

pes10k commented Feb 27, 2020

youennf commented May 5, 2021

henbos commented May 5, 2021

henbos commented Sep 12, 2022

alvestrand commented Sep 12, 2022

youennf commented Sep 12, 2022

henbos commented Sep 13, 2022

henbos commented Sep 27, 2022

Stats API should require additional permission / user opt-in #550

Stats API should require additional permission / user opt-in #550

Comments

pes10k commented Feb 18, 2020

youennf commented Feb 19, 2020

pes10k commented Feb 19, 2020

fippo commented Feb 19, 2020 • edited Loading

henbos commented Feb 19, 2020

youennf commented Feb 19, 2020

pes10k commented Feb 19, 2020

alvestrand commented Feb 26, 2020

henbos commented Feb 26, 2020

pes10k commented Feb 27, 2020

youennf commented May 5, 2021

henbos commented May 5, 2021

henbos commented Sep 12, 2022

alvestrand commented Sep 12, 2022

youennf commented Sep 12, 2022

henbos commented Sep 13, 2022

henbos commented Sep 27, 2022

fippo commented Feb 19, 2020 •

edited

Loading