-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create speech-recognition-context.md #140
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very welcome addition, thanks for the initial design. Let's iterate a bit on the API shape, but I'm excited about this.
|
||
|
||
### 2. **Enhanced Relevance** | ||
By incorporating contextual information, the speech recognition models can produce transcriptions that are more meaningful and aligned with the user's intent. This leads to better understanding of the spoken content and more accurate execution of voice commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/intent/expectations/. Or more correct maybe, I don't know how to best phrase this. But the user doesn't have an intent here, it receives a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote it to say better align with the user's expectations of the output
.
|
||
|
||
### 1. **SpeechRecognitionPhrase** | ||
This interface holds fundamental information about a biasing phrase, such as a text string and a boost value indicating how likely the phrase will appear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WebIDL strawman usually help. Is this just an alias to DOMString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added IDL definitions for these interfaces and you can take a look again.
|
||
|
||
### 2. **SpeechRecognitionPhraseList** | ||
This interface holds a list of `SpeechRecognitionPhrase` and supports adding more `SpeechRecognitionPhrase` to the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just sequence<SpeechRecognitionPhrase>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with using sequence a bit. I think it is feasible to put sequence<SpeechRecognitionPhrase>
inside SpeechRecognitionContext
and get rid of SpeechRecognitionPhraseList
, but that means we need to move other methods like the definitions of length
(which is somehow required by blink/v8 if it detects an array) and addItem
from SpeechRecognitionPhraseList
into SpeechRecognitionContext
too. In that case if we add more types of data to SpeechRecognitionContext
in the future, it might become confusing, and we won't be able to support another array inside SpeechRecognitionContext
because the definition of length
needs to duplicate.
From the IDL examples I can find, seems like it is a common practice to create a new ObjectList
interface to support a new Object
interface (e.g. DataTransferItemList
), and sequence is used more often in a dictionary. I'm not sure if we should make SpeechRecognitionPhrase
and SpeechRecognitionContext
become dictionary instead, so that their relationship will be as simple as SpeechRecognitionContext
contains a sequence of SpeechRecognitionPhrase
. We want to perform data validation on each SpeechRecognitionPhrase
so using dictionary may oversimplify things? Let me know what you think!
const recognition = new SpeechRecognition(); | ||
recognition.start(); | ||
var list = new SpeechRecognitionPhraseList(); | ||
list.addItem(new SpeechRecognitionPhrase("updated text", 2.0)); | ||
var context = new SpeechRecognitionContext(list); | ||
recognition.updateContext(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we lose by simply doing:
const recognition = new SpeechRecognition(); | |
recognition.start(); | |
var list = new SpeechRecognitionPhraseList(); | |
list.addItem(new SpeechRecognitionPhrase("updated text", 2.0)); | |
var context = new SpeechRecognitionContext(list); | |
recognition.updateContext(context); | |
const recognition = new SpeechRecognition(); | |
recognition.start(); | |
var list = [{text: "update text", weight: 2.0}]; | |
recognition.updateContext(list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same interface simplification goes for the other example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you still want to simplify things like this, can you tell me how IDL will look like in this case? We also want to perform data validation on each SpeechRecognitionPhrase
, so using new SpeechRecognitionPhrase()
helps us throw an error as we validate there, otherwise we would need to hold on validation until updateContext()
is called?
if (event.error == "recognition-context-not-supported") { | ||
console.log("Recognition context is not supported: ", event); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this example leaving the actual use of the context that would lead to an error? I'm not sure if I follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote and added more sample codes to hopefully make it clear. Please take a look again and let me know if additional explanation is needed!
Add an explainer for the new speech recognition context feature
Explainer for speech recognition context is added in WebAudio#140
Explainer for speech recognition context is added in WebAudio#140
Add an explainer for the new speech recognition context feature