-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add current audio latency metric #124
Conversation
Where the discussion left off from earlier is: #117 (comment)
This is my understanding too.
My understanding is that if "current audio latency" has a meaningful definition, then so does the total audio latency, because it is the same metric except that you increment for each audio frame rather than only exposing the latest value. It is up to the UA to estimate the latency of an audio frame. The app does not need to care as long as it is accurate. |
Do you mean the first set of 1024 frames have a latency of 10 ms, the second set of 1024 frames have a latency of 500 ms, and the third set of 1024 frames have a latency of 36 ms? If so I would expect: deliveredFramesLatency = 1024 * 10 + 1024 * 500 + 1024 * 36 = 559104 ms Meaning the average latency per frame is 559104 / 3072 = 182 ms. (And if buffering frames in order to send them in a batch adds latency, then that would be part of the latency measurement.) |
I tried to clarify that latency includes buffering. @o1ka who is the Chrome WebRTC audio expert, does this PR make sense to you? |
The definition makes sense to me. The question re: average / rolling average is not quite clear. "current audio latency", i.e. the latency of the last delivered frame, has a specific value, it's not a statistic. The aggregated "delivered audio frames latency" allows the clients to have whatever statistics they need: overall average, the histogram of "average over polling interval", etc. |
I forgot if I linked this or not, but the "total measure + number of measurements" (in this case total latency and number of delivered frames) is based on the same principle as is described in the bottom half of WebRTC's Guidelines for design of stats objects, and I am also in strong favor of also exposing most recent values (e.g. current |
PTAL @padenot and @jan-ivar. As discussed in the meeting yesterday, I have updated this PR to:
|
Assuming this PR can merge this week, I'll create the follow-up (min/max/avg) PR after this one has merged since I apparently don't know how to create PR chains on GitHub |
I don't understand why this contains things that have not been agreed upon, such as dropped frames. |
index.html
Outdated
@@ -545,14 +547,26 @@ <h4>The MediaStreamTrackAudioStats interface</h4> | |||
such as if the track is muted or disabled, then the counters do | |||
not increase.</p> | |||
</div> | |||
<li> | |||
<p>The <dfn data-lt="current audio latency">current audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a rewrite. When defining something, it's best to proceed like so:
- Define what a certain concept is. In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device. Similarly, we could spec the second part as the point in time the audio data is available to script. It's also important to say that this must be the latest available figure. "Current" isn't very precise for something that changes 300 times a second.
- Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?
- Don't use the word "note" in normative statements, this is confusing. Or use an explicit non-normative section, in which case you can use the word
may
. - In specification, say what needs to be done, not what doesn't need to be done: saying that this shouldn't include the output latency is bad, it should naturally follow from other definitions. If there's room for confusion, add an informative note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define what a certain concept is [...] Don't use the word "note" [...] say what needs to be done, not what doesn't need to be done
Reworded. If this is not to your liking please suggest an edit that is.
I now have a definition of "input latency" which can be referenced by both the "latest input latency" (this PR) and for min/max/avg latency (future PR being discussed in #128).
In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device.
The sink may not be an output device. But I reworded it to use your language except I say "to delivery of any of its sinks". Does that make sense to you?
Similarly, we could spec the second part as the point in time the audio data is available to script.
Not sure how to address this comment. Isn't "the script" just another sink?
Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?
I agree it would be nice to avoid "if not null" since it would only really be null before the first frames have been delivered. I guess it could be up to the user agent to make a sensible initial guess? I don't know, I removed nullable for API ergonomics reasons, but it's a bit under-specified what should happen if you call track.stats.latency
inside the getUserMedia
promise's resolve()
in which case I don't know if a measurement has necessarily been made yet, depending on implementation?
@@ -578,9 +592,11 @@ <h4>The MediaStreamTrackAudioStats interface</h4> | |||
set {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}} | |||
to [= delivered audio frames duration =], | |||
set {{MediaStreamTrackAudioStats/[[DroppedFrames]]}} to | |||
[= dropped audio frames =] and | |||
[= dropped audio frames =], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #124 (comment) and #129
This PR does not add dropped frames. Dropped frames was merged in a previous PR (#117). At the time I didn't understand you had a problem with them, so I did what Jan-Ivar proposed and split up the PR in order to make progress (using the editors can integrate label). Since we haven't resolved this disagreement, I just filed #129. As for this PR, it only adds input latency. Do you have any concerns with the definitions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take another look and suggest edits
@@ -578,9 +592,11 @@ <h4>The MediaStreamTrackAudioStats interface</h4> | |||
set {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}} | |||
to [= delivered audio frames duration =], | |||
set {{MediaStreamTrackAudioStats/[[DroppedFrames]]}} to | |||
[= dropped audio frames =] and | |||
[= dropped audio frames =], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #124 (comment) and #129
index.html
Outdated
@@ -545,14 +547,26 @@ <h4>The MediaStreamTrackAudioStats interface</h4> | |||
such as if the track is muted or disabled, then the counters do | |||
not increase.</p> | |||
</div> | |||
<li> | |||
<p>The <dfn data-lt="current audio latency">current audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define what a certain concept is [...] Don't use the word "note" [...] say what needs to be done, not what doesn't need to be done
Reworded. If this is not to your liking please suggest an edit that is.
I now have a definition of "input latency" which can be referenced by both the "latest input latency" (this PR) and for min/max/avg latency (future PR being discussed in #128).
In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device.
The sink may not be an output device. But I reworded it to use your language except I say "to delivery of any of its sinks". Does that make sense to you?
Similarly, we could spec the second part as the point in time the audio data is available to script.
Not sure how to address this comment. Isn't "the script" just another sink?
Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?
I agree it would be nice to avoid "if not null" since it would only really be null before the first frames have been delivered. I guess it could be up to the user agent to make a sensible initial guess? I don't know, I removed nullable for API ergonomics reasons, but it's a bit under-specified what should happen if you call track.stats.latency
inside the getUserMedia
promise's resolve()
in which case I don't know if a measurement has necessarily been made yet, depending on implementation?
<div class="note"> | ||
<p>A sink that consumes audio may add additional processing | ||
latency not included in this measurement, such as playout delay | ||
or encode time.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the intent of this value is for realtime processing.
Adding a note about the intent (goal is to allow monitoring not realtime processing) would be nice.
If we want very precise latency info for say web audio apps, it might be best to expose this value in web audio measuring microphone to web audio sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note about monitoring, WDYT @youennf ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is useful for Web Audio applications as is. Web Audio applications that perform recordings of audio input device need to know how far in the past, audio recordings should be shifted to be in phase with other elements. In a well-behaved system, there is little expectations for this value to change, but it should be possible to see it changed if need be (a common cause will be drift compensation when crossing time domains).
Examples: https://ampedstudio.com/manual/studio-settings/ (end of page, workaround by using a loop-back mic), https://www.w3.org/2021/03/media-production-workshop/talks/ulf-hammarqvist-audio-latency.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that value is useful, why not adding it to WebAudio directly? This should be cheap enough to do and would give a more precise latency info given it would compute the actual latency between microphone and web audio graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Web Audio API does not deal with input, only output. It's problematic for various reasons, but it's how it was done. The object that is used to refer to the microphone is the MediaStreamTrack
, and so this is why we're adding it there.
The latency in between the microphone and the web audio graph is precisely what is being added here, in the sense that the web audio graph is a sink, because it runs off of a system audio output callback.
But there can be other sinks, such as an HTMLMediaElement
, or using the microphone in some other way, directly feeding it to a MediaRecorder
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Web Audio API does not deal with input,
The object that is used to refer to the microphone is theMediaStreamTrack
I do not understand this.
MediaStreamAudioSourceNode
is the web audio sink of MediaStreamTrack
.
Can you clarify why microphone-to-web-audio latency should not be put there instead?
At least in Safari, there is a small buffer at MediaStreamAudioSourceNode
which could impact the web audio input latency.
there can be other sinks, such as an
HTMLMediaElement
Exactly, with different latencies potentially. It makes more sense to put values there for those applications.
For instance, at least for Safari's implementation, droppedVideoFrames makes sense in https://w3c.github.io/media-playback-quality/#dom-htmlvideoelement-getvideoplaybackquality but does not make sense inMediaStreamTrack
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re dropped video frames, in Chrome it is possible for frames to be dropped prior to sink delivery (measured by track.stats
) or by the sink, for example the RTCPeerConnection (measured by pc.getStats
). It would clearly be wrong to expose encoder frame drops inside track.stats
. We could expose track frame drops inside the media-source
stats object that is returned by getStats
, but this would mean that every sink has to report something that is outside of its own scope. And let me remind you that the reason we're talking about track.stats
rather than getStats
is that when we added these and other metrics to webrtc-stats, we got pushback for adding stuff that is out of scope. (Which I think is justified, as there are non-WebRTC use cases that care about track stats.)
So likewise, if MediaStreamAudioSourceNode
is adding latency that is not accounted for by track.stats
, then that seems like a good place to measure it. But I still see value in each part of the pipeline measuring its own thing rather than letting components be responsible for neighbor components and making assumptions about how many steps in the pipeline is of interest to the application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @henbos. If MediaStreamAudioSourceNode
(or its Track
sibling) adds latency, then we can add something there. More likely than not we won't add something there, because if it adds latency there is a design problem in the implementation, and we'll argue in the exact same way we're arguing about audio frame drops: it's better for the Web that we fix our implementation than to add a way to measure avoidable latency overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we agree, does that mean we can merge this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: youennf <[email protected]>
Fixes #119.
Part of #96. See also follow up #128.
Preview | Diff