Add application/dash+xml to opaque-safelisted MIME types. #23

anforowicz · 2021-02-25T00:08:37Z

Closes #20.

annevk

Oxford comma please. If this is only used for media, should we safelist it only there? I guess this is okay, as long as it's consistently used for all 206 responses.

anforowicz · 2021-02-26T23:08:51Z

Thanks @annevk! Can you PTAL again?

Oxford comma please.

Done. (I think. I use github infrequently so please shout if I am an following best practices or if I do something wrong :-) )

If this is only used for media, should we safelist it only there? I guess this is okay, as long as it's consistently used for all 206 responses.

I am not sure if I understand the question. By "safelist it only there" do you mean splitting off "image/svg+xml" and "application/dash+xml" into a separate list of "opaque-safelisted media MIME types" and adding a separate step that refers to that list of MIME types?

I don't understand what impact 206 responses have here. AFAIU there should be no range requests for "application/dash+xml" (which is just a manifest that stores URLs of real video files). Let me CC @dalecurtis in case there are some gotchas here that I might not be aware of.

I think that no other changes are needed / I think that is is okay to just augment the definition of "opaque-safelisted MIME type". In particular, it should be okay to just apply whatever approach we currently take for image/svg+xml (which is another multimedia type that is XML-based and not sniffable in steps 6 nor 7).

annevk

Yeah, I guess in that case the server would reply with a 200, despite the range request. (I was thinking of only allowing this for media element requests, but that doesn't make a lot of sense.)

Having said that, how do we avoid security issues with all the requests of the URLs listed in the manifest? Can they all be sniffed?

dalecurtis · 2021-03-01T17:07:25Z

An application could partially parse a segment and use range requests to get at only the data it needs from the mdat. I don't know how common that behavior is, I agree it sounds like it's of limited utility. I would ask someone from Apple who's more familiar with HLS implementation details.

annevk · 2021-03-01T17:17:04Z

What I'm wondering, is that say we have a manifest that points to a bunch of URLs. How do we determine that these are okay to bypass (C)ORB? Do they have to be same-origin with the manifest?

anforowicz · 2021-03-01T17:45:57Z

[...] we have a manifest that points to a bunch of URLs. How do we determine that these are okay to bypass (C)ORB?

I was assuming that we can just apply the ORB algorithm to the separate, future requests for these URLs. I was assuming that the a regular, no-range request will always be issued first for any of the segments / URLs listed in a DASH manifest.

If the assumption above holds, then subsequent requests will be covered by the existing ORB steps (ones that sniff the beginning of a response body, populate the opaque-safelisted requesters set, and then use the set for subsequent 206 responses). If the assumption doesn't hold (?) then I am not sure what options we have... I guess ORB could parse the DASH manifest, but this extra complexity in ORB seems a bit undesirable...

dalecurtis · 2021-03-01T17:49:15Z

On Android we only allow same origin manifests because the OS doesn't provide the information we'd need to taint when a redirect occurs, but this did break some clients -- so sites are using cross origin manifests. If we had a native HLS/DASH implementation I imagine we'd allow the redirects and just enforce tainting. I haven't had time to test what Apple does with this, hopefully the taint the frames, but I haven't checked.

annevk · 2021-03-02T06:03:49Z

Do those segments all start with something that can be sniffed as media? If it's just a video file split up, similar to ranges, you would only see if it's media in the first segment, no? But I don't know enough about HLS/DASH to say.

@youennf do you know?

annevk · 2021-03-02T06:10:04Z

Is https://tools.ietf.org/html/rfc8216 the HLS you are talking about? It doesn't mention XML at all.

youennf · 2021-03-02T10:38:29Z

DASH mpd is XML, HLS is not.

I don't know of any DASH native client so all loads triggered from reading the DASH mpd will be done through fetch/XHR.
Why safe-listing this mime type then?

From memory, HLS and DASH can use byte ranges to define media segments.
I am not sure sniffing media on these segments will be easy since you would need to know whether this is mp3, mp4, text tracks. Also for video, one often needs to download the init segment in addition to the media segment to properly play the content.
@jernoble might know more.

annevk · 2021-03-02T11:36:29Z

The MIME type is safelisted so that if you have <video src=https://someotherorigin/dash.xml> the process the video element is in gets to see the file (presumably this works today and doesn't require CORS, otherwise we would not be discussing this) and do something with it. From your response it sounds like the files listed in there cannot be sniffed as they represent ranges of a media track. So the question then becomes whether this dash file can point to files on other origins which would expose those files to the video element process (and thus the attacker).

anforowicz · 2021-03-02T15:24:59Z

So the question then becomes whether this dash file can point to files on other origins which would expose those files to the video element process (and thus the attacker).

Would it help (both for segments that are same-origin or cross-origin from the dash manifest) to have ORB allow 206 responses when 1) (old step) ORB has previously sniffed the response as multimedia and added it to the opaque-safelisted requesters set and 2) (new, hypothetical step somewhere) the 206 response has an audio/* or video/* MIME type? It seems to me that it might make ORB compatible with the current web, at the cost of a (small?) erosion of ORB security protections.

FWIW, I am not sure what we would gain from enforcing that dash segments and dash manifests are same-origin. Even if they were always same-origin to each other, they could still be cross-origin from the frame (i.e. from the html containing the <video> element). And therefore we would still be left with a question of how to make ORB allow the segment responses through, right?

jernoble · 2021-03-02T15:41:45Z

I haven't had time to test what Apple does with this, hopefully the taint the frames, but I haven't checked.

Yes, WebKit will taint the frames of a video element when that video’s resources would otherwise fail CORS, or for the case of HLS, when there are multiple origins involved.

An application could partially parse a segment and use range requests to get at only the data it needs from the mdat. I don't know how common that behavior is, I agree it sounds like it's of limited utility. I would ask someone from Apple who's more familiar with HLS implementation details.

Range requests for media data listed in a HLS manifest is very common. The manifest can and does list the specific byte ranges representing specific time ranges in the resource.

annevk · 2021-03-02T15:47:08Z

@jernoble we're not talking about tainting here. We're talking about whether the data should even arrive in the process that does the decoding as once it arrives an attacker will be able to read it through Spectre. See https://github.com/annevk/orb/blob/main/README.md.

@anforowicz reading https://tools.ietf.org/html/rfc8216#section-4 (which might not be applicable to DASH at all) I don't have much hope that we can rely on MIME types, but maybe?

If they are same-origin with the manifest I think we can make a reasonable inference that the origin it's all hosted on is okay with sharing that data. Teaching ORB about that can get involved, depending.

jernoble · 2021-03-02T20:09:58Z

@jernoble we're not talking about tainting here. We're talking about whether the data should even arrive in the process that does the decoding as once it arrives an attacker will be able to read it through Spectre.

I see. Does this assume that the decoding process is the same as the process in which the attacker is running JavaScript? If so, that assumption may not hold for all UAs, some of which could have separate processes for: Networking, JavaScript/DOM, and Decoding; and the data flow from the site to the Decoder could skip the Javascript/DOM process entirely.

anforowicz · 2021-03-02T20:25:08Z

@jernoble,

Does this assume that the decoding process is the same as the process in which the attacker is running JavaScript? If so, that assumption may not hold for all UAs, some of which could have separate processes for: Networking, JavaScript/DOM, and Decoding; and the data flow from the site to the Decoder could skip the Javascript/DOM process entirely.

This should be okay, because ORB does not want to block responses that contain decodable audio/images/videos. If a response sniffs as audio/image/video, then ORB should allow the response (i.e. passing the response body to whatever destination process needs to consume it).

I guess the above flows from the fact that CORB and ORB are technologies that help secure OOPIFs (out of process iframes) and AFAIU the current OOPIF implementations in Chromium and Firefox allow embedding of audio/images/videos/scripts/stylesheets/etc in an OOPIF that hosts HTML from another origin (in other words - there is no OOPIMG support). Therefore CORB and ORB mostly care about blocking things that OOPIFs do not allow to disclose cross-origin - for example blocking a cross-origin, no-cors fetch that attempts to get a cross-origin html/xml/json within reach of a Spectre-wielding attacker within the OOPIF-hosting process).

annevk · 2021-03-03T07:19:07Z

@jernoble even in an architecture where DASH is not handled by the same process that holds the video element (and is able to bypass it entirely for all the "sub" fetches it needs to do as you indicate) I think you might run into issues with service workers, that can intercept and cache responses to these requests. However, it does seem theoretically possible to keep the actual responses out-of-process and just pass a reference token around in the service worker.

If you did all that you would not be impacted by these media manifest questions I suppose, but still need (C)ORB for the script element (and potentially the various sources that want an image, depending). There might still be theoretical issues with data not immediately failing the decoder and the rendered artifacts revealing information about the data. And not having alignment on the fetching model does not seem great for interoperability.

anforowicz · 2021-03-04T18:40:29Z

/cc @acolwell who has kindly discussed application/dash+xml with me earlier today (confirming that the point of the manifest is to enable range requests into a middle of video files + the fragility of trying to handle the current multimedia manifest formats in ORB; the latter might be seen as an argument toward adopting an extra ORB step to allow 206 responses labeled as audio/* or video/* MIME type - this was brought up in #23 (comment) above).

acolwell · 2021-03-04T20:09:21Z

I'll start of by admitting I'm not completely confident I understand all of the issues here, but I am wondering if a way out could be to just force CORS usage for these manifests.

My main thinking is that there are several DASH/HLS players built with JS using fetch/XHR and Media Source Extensions. I believe most of them require CORS so that they can properly handle cross-origin requests in the JS environment. I'm wondering if it is reasonable to just place the same requirement on video elements that natively support DASH/HLS manifests via video.src. I suspect sites probably need to use CORS to get cross browser interop today anyways so maybe formalizing this requirement might not break many sites.

It feels like this would be compatible with Apple's native HLS implementation and Chrome's existing Android implementation that only allow same-origin requests.

annevk · 2021-03-05T15:08:44Z

@acolwell thank you for weighing in! For MSE you indeed need CORS for cross-origin resources.

If we can require CORS for cross-origin DASH/HSL linked from <audio src> and <video src> that would be excellent. I guess that means that instead of safelisting, we ought to blocklist application/dash+xml.

HSL also has MIME types but also some path "extension" matching(?!):

Each Playlist file MUST be identifiable either by the path component of its URI or by HTTP Content-Type. In the first case, the path MUST end with either .m3u8 or .m3u. In the second, the HTTP Content-Type MUST be "application/vnd.apple.mpegurl" or "audio/mpegurl". Clients SHOULD refuse to parse Playlists that are not so identified.

I'd be on board with blocklisting those MIME types as well, but I don't think we should standardize any kind of extension matching if we can avoid it. I'm actually surprised to see this in a relatively new RFC as I thought browsers only used to do that for plugins.

anforowicz · 2021-03-05T19:01:21Z

Yes - thank you for chiming in @acolwell - much appreciated.

Not sure what to do at this point with this pull request and with #20 - should we just close both of them? Or should this wait until we figure out how to spec the CORS requirement for DASH/HLS? (FWIW, I've found the ISO spec for DASH here, but I've had trouble getting and finding the details that might talk about CORS vs no-cors mode)

annevk · 2021-03-09T13:13:36Z

I filed whatwg/html#6468 on requiring CORS for these manifests. I don't think we have to block on that, but maybe we should have an issue for it here with regards to test coverage?

What do you think about my idea of adding these 3 MIME types to the blocklist? (We might also want to consider this for other MIME types we know have to be fetched with CORS, such as text/vtt.)

These MIME types cannot be fetched without CORS. Closes #20 and closes #23. Follow-up: whatwg/html#6468.

anforowicz · 2021-10-04T19:24:11Z

Let's abandond this pull request and treat application/dash+xml as never-sniffed - this is covered by: #26

These MIME types cannot be fetched without CORS. Closes #20 and closes #23. Follow-up: whatwg/html#6468.

Add application/dash+xml to opaque-safelisted MIME types.

6f3d29a

Closes annevk#20.

anforowicz requested a review from annevk February 25, 2021 00:09

annevk reviewed Feb 25, 2021

View reviewed changes

Adding an Oxford comma.

08cd717

annevk reviewed Feb 27, 2021

View reviewed changes

anforowicz mentioned this pull request Mar 4, 2021

CORB should not block application/dash+xml videos whatwg/fetch#886

Closed

annevk mentioned this pull request Mar 9, 2021

Require CORS for HLS and DASH media formats whatwg/html#6468

Open

annevk added a commit that referenced this pull request Oct 4, 2021

Block manifest-based media and WebVTT

34cf17e

These MIME types cannot be fetched without CORS. Closes #20 and closes #23. Follow-up: whatwg/html#6468.

annevk mentioned this pull request Oct 4, 2021

Block manifest-based media and WebVTT #26

Merged

anforowicz closed this Oct 4, 2021

annevk added a commit that referenced this pull request Oct 5, 2021

Block manifest-based media and WebVTT

9c4bd7f

These MIME types cannot be fetched without CORS. Closes #20 and closes #23. Follow-up: whatwg/html#6468.

anforowicz deleted the application-dash-xml branch November 12, 2021 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add application/dash+xml to opaque-safelisted MIME types. #23

Add application/dash+xml to opaque-safelisted MIME types. #23

anforowicz commented Feb 25, 2021

annevk left a comment

anforowicz commented Feb 26, 2021

annevk left a comment

dalecurtis commented Mar 1, 2021

annevk commented Mar 1, 2021

anforowicz commented Mar 1, 2021

dalecurtis commented Mar 1, 2021

annevk commented Mar 2, 2021

annevk commented Mar 2, 2021

youennf commented Mar 2, 2021

annevk commented Mar 2, 2021

anforowicz commented Mar 2, 2021 •

edited

Loading

jernoble commented Mar 2, 2021

annevk commented Mar 2, 2021

jernoble commented Mar 2, 2021

anforowicz commented Mar 2, 2021

annevk commented Mar 3, 2021

anforowicz commented Mar 4, 2021

acolwell commented Mar 4, 2021

annevk commented Mar 5, 2021

anforowicz commented Mar 5, 2021

annevk commented Mar 9, 2021

anforowicz commented Oct 4, 2021

Add application/dash+xml to opaque-safelisted MIME types. #23

Add application/dash+xml to opaque-safelisted MIME types. #23

Conversation

anforowicz commented Feb 25, 2021

annevk left a comment

Choose a reason for hiding this comment

anforowicz commented Feb 26, 2021

annevk left a comment

Choose a reason for hiding this comment

dalecurtis commented Mar 1, 2021

annevk commented Mar 1, 2021

anforowicz commented Mar 1, 2021

dalecurtis commented Mar 1, 2021

annevk commented Mar 2, 2021

annevk commented Mar 2, 2021

youennf commented Mar 2, 2021

annevk commented Mar 2, 2021

anforowicz commented Mar 2, 2021 • edited Loading

jernoble commented Mar 2, 2021

annevk commented Mar 2, 2021

jernoble commented Mar 2, 2021

anforowicz commented Mar 2, 2021

annevk commented Mar 3, 2021

anforowicz commented Mar 4, 2021

acolwell commented Mar 4, 2021

annevk commented Mar 5, 2021

anforowicz commented Mar 5, 2021

annevk commented Mar 9, 2021

anforowicz commented Oct 4, 2021

anforowicz commented Mar 2, 2021 •

edited

Loading