Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove restriction on streams being limited to only one PC #201

Closed
wants to merge 1 commit into from

Conversation

palak8669
Copy link
Contributor

@palak8669 palak8669 commented Sep 5, 2023

@dontcallmedom-bot
Copy link

This issue was mentioned in WEBRTCWG-2023-09-12 (Page 48)

@alvestrand alvestrand requested a review from aboba September 26, 2023 11:46
@youennf
Copy link
Collaborator

youennf commented Sep 29, 2023

As discussed during yesterday's editor's call, here is some feedback for this issue.

This 1 pc restriction is similar to the restriction to preserve the ordering of chunks. It prevents potential footguns. It removes edge cases. It helps browser interoperability. It is consistent with the WebRTC media pipeline and encoded transform model. As it is currently, WebRTC encoded transform is nothing but an encoder post-processor or a decoder pre-processor. No encoder/decoder -> no processor.

For instance, if a website tries to write too much data, the UA will drop frames before encoder to enforce bandwidth allocation. The transform will not receive any new chunk until the sender gets back some allocated bandwidth slot. Another example is that the web app cannot send data through the sender transform of a recvonly transceiver. Ditto on receive side where track may be muted. These are good properties.

Removing this restriction would change the model of WebRTC encoded transform, and more generally of the WebRTC media pipeline. We should favour proposals that can implement the use cases by staying consistent with the existing model we agreed on.

AIUI, the use cases can be implemented by new APIs, located elsewhere than WebRTC encoded transform. Two examples come to mind:

  • A low level API that would allow building an SFU in a browser. This could be the RTCRtpTransport API presented at TPAC. My understanding from TPAC meeting is that there is interest in moving forward with this proposal.
  • A high level API that would allow forwarding received media (without reencoding whenever possible) via another peer connection, as mentioned by @jan-ivar. Say for instance sender.enableMediaForwarding({ ... }) or sender.replaceTrack(track, { forwarding: true, ... });. That API could be complemented by progressively adding knobs to control the exact forwarding behaviour, as we discover more needs from applications. This is inline with the current RTCPeerConnection model we all like.

Above proposals can lead to good interop and good user experience (minimum latency, minimum CPU overhead...).

@alvestrand
Copy link
Contributor

On the other hand, those restrictions prevent solving the "late fanout" usecase, and other usecases that can't be solved under those restrictions using this API.
I believe (based on implementation experience) that a frame-level API is easier to use and harder to make mistakes in than a packet-level API for many of those use cases, and that making such an API available to users is a benefit for the Web platform.

One possibility that may satisfy both our constituencies is to redefine the API; if people are willing to live with the restriction of "one input, one output", they can use the ScriptTransform API; if people desire to go outside those restrictions, they use a different means of instantiating the input and output streams of frames that are the essential parts of this API for those purposes.

@youennf
Copy link
Collaborator

youennf commented Sep 29, 2023

if people desire to go outside those restrictions, they use a different means of instantiating the input and output streams of frames that are the essential parts of this API for those purposes.

I thought about this approach a while ago. It is cleaner in the sense that it clearly states to the UA that (focusing on sender side) the JS application is responsible to implement the source+encoder part (which late fanout is clearly about). As an example, it will help UA provide meaningful WebRTC stats. Setting a track on such a sender could throw...

Exposing such API requires us to refine the WebRTC media pipeline model as we would expose things that are fully internal right now. This might not be unrelated to the TPAC Media/WebRTC WG discussion meeting I missed.

Related to this approach, my first questions would be:

  • The frame level vs. packet level question. It would be great to see whether we can get consensus within the WG. For instance, if we have a packet level API, do we also need a frame level API? Do we need both?
  • @jan-ivar's question: should fanout be done in JS or done by UA with web app tuning it via knobs?

To refine the WebRTC media pipeline model, there are a number of questions that might be useful to tackle, for instance:

  • the bandwidth allocation model: WebRTC engine is providing a bitrate that the encoder needs to respect. What if it is undershooting or overshooting it?
  • the timing model: WebRTC engine is processing frames in a given time-based order. What if JS encoder is sending frames differently?
  • Get consensus on Receiver transform depacketizer requirements #109.
  • the feedback model: WebRTC engine is providing instructions on the encoder (I need a key frame) based on RTCP feedback. Should we expose this?
  • the error model: WebRTC engine can starve the encoder with input. What if JS is sending frames when UA does not expect it?
  • the overuse model: WebRTC engine is preventing excessive CPU use by decreasing frame rate and/or resolution. Should we expose this? Probably not but worth digging.
  • the threading model: WebRTC engine is probably currently doing this processing in background threads. Should we put the same restriction on the web?

Hope this helps moving forward.

@guidou
Copy link
Collaborator

guidou commented Oct 23, 2023

I thought about this approach a while ago. It is cleaner in the sense that it clearly states to the UA that (focusing on sender side) the JS application is responsible to implement the source+encoder part (which late fanout is clearly about). As an example, it will help UA provide meaningful WebRTC stats. Setting a track on such a sender could throw...

Can this be addressed by passing an extra optional parameter to the RTCRtpScriptTransform constructor?

Exposing such API requires us to refine the WebRTC media pipeline model as we would expose things that are fully internal right now. This might not be unrelated to the TPAC Media/WebRTC WG discussion meeting I missed.

Harald's congestion control proposal is in line with this. #207
Also, depending on the use case this might or might not be an issue. There is a lot of value in optionally lifting the restriction to support valid use cases.

Related to this approach, my first questions would be:

The frame level vs. packet level question. It would be great to see whether we can get consensus within the WG. For instance, if we have a packet level API, do we also need a frame level API? Do we need both?
@jan-ivar's question: should fanout be done in JS or done by UA with web app tuning it via knobs?

Some advantages of the frame-level approach:

  1. The changes required to solve the problem (including glitch-free failover for multiple input PCs) at the frame level are minimal: add a setMetadata() method to RTCEncodedVideo/AudioFrame, and allow removing the 1PC restriction. We have experimented with this in Chrome (which implements the pre-standard version that doesn't have the 1PC restriction) and the results are very good in some real-world scenarios.
  2. The programming model is a small extension to the well-understood model of encoded transform.
  3. It makes it easier to support other transport protocols if necessary.
  4. The concerns about exposing internal aspects of the WebRTC pipeline do not apply to all use cases and can be addressed with specific APIs for that purpose.

A packet-level API, depending on how it's made, can have the advantage of allowing forwarding to start without waiting for the whole frame, and is able to deal with packet loss directly, but has other disadvantages for this use case, such as:

  1. It is a new model, and it is likely to be less straightforward to support glitch-free failover.
  2. It is possible to use a frame-to-packetizer model for sending (as in [1]), in which case the forwarding advantage does not exist.
  3. It is a major API intended to support multiple use cases, which would require plenty of time to design, build consensus, implement and deploy. It's not clear that in the end the solution for this use case will necessarily be better than the frame-based solution. It will for sure be less timely.
  4. It is tied to RTP, which is OK in many cases, but some users might prefer a solution not tied to the transport protocol as they might consider migrating in the future to alternative protocols (e.g., Quic).

WDYT?

[1] https://docs.google.com/presentation/d/1vqlz0vbF1JFmKxVfTrhpBwomwu1aiSJz5p-Q0PXNDKA/edit#slide=id.g27b84bfbaa3_13_374)

@alvestrand
Copy link
Contributor

Status: This PR has been languishing for 3 months now.
The addition of a pure "sendonly" API is under consideration, but seems to require more time (#211), and the discussion so far has not shown a compelling difference from the interface that is effectively created if you don't attach a sending track to a transceiver.

The use case for forwarding has been accepted.
No use case where the restriction serves a positive purpose (enabling an use case) has been described; the only argument put forward is that it gives the opportunity to fire an error when people try unexpected things.

We should consider again whether we should remove the restriction from the spec.

@youennf
Copy link
Collaborator

youennf commented Dec 22, 2023

The addition of a pure "sendonly" API is under consideration

It seems this API got some 'room consensus' at last WG meeting.
I'll be happy to help moving this forward.

No use case where the restriction serves a positive purpose (enabling an use case) has been described

A number of reasons for this restriction have been mentioned in this thread.

We should consider again whether we should remove the restriction from the spec.

I am not sure discussing this again will be as fruitful as making progress on the pure "send only" API.

@guidou guidou marked this pull request as draft March 14, 2024 09:59
@guidou
Copy link
Collaborator

guidou commented Jul 18, 2024

No longer pursuing.
We will focus on the RTCRtpEncodedSource proposal.

@guidou guidou closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants