Create speech-recognition-context.md #140

yrw-google · 2025-02-19T23:07:44Z

Add an explainer for the new speech recognition context feature

padenot

A very welcome addition, thanks for the initial design. Let's iterate a bit on the API shape, but I'm excited about this.

padenot · 2025-02-20T14:55:39Z

explainers/speech-recognition-context.md

+
+
+### 2. **Enhanced Relevance**
+By incorporating contextual information, the speech recognition models can produce transcriptions that are more meaningful and aligned with the user's intent. This leads to better understanding of the spoken content and more accurate execution of voice commands.


s/intent/expectations/. Or more correct maybe, I don't know how to best phrase this. But the user doesn't have an intent here, it receives a result.

I rewrote it to say better align with the user's expectations of the output.

padenot · 2025-02-20T14:56:17Z

explainers/speech-recognition-context.md

+
+
+### 1. **SpeechRecognitionPhrase**
+This interface holds fundamental information about a biasing phrase, such as a text string and a boost value indicating how likely the phrase will appear.


WebIDL strawman usually help. Is this just an alias to DOMString ?

I've added IDL definitions for these interfaces and you can take a look again.

padenot · 2025-02-20T14:56:32Z

explainers/speech-recognition-context.md

+
+
+### 2. **SpeechRecognitionPhraseList**
+This interface holds a list of `SpeechRecognitionPhrase` and supports adding more `SpeechRecognitionPhrase` to the list.


Is this just sequence<SpeechRecognitionPhrase> ?

I experimented with using sequence a bit. I think it is feasible to put sequence<SpeechRecognitionPhrase> inside SpeechRecognitionContext and get rid of SpeechRecognitionPhraseList, but that means we need to move other methods like the definitions of length (which is somehow required by blink/v8 if it detects an array) and addItem from SpeechRecognitionPhraseList into SpeechRecognitionContext too. In that case if we add more types of data to SpeechRecognitionContext in the future, it might become confusing, and we won't be able to support another array inside SpeechRecognitionContext because the definition of length needs to duplicate.

From the IDL examples I can find, seems like it is a common practice to create a new ObjectList interface to support a new Object interface (e.g. DataTransferItemList), and sequence is used more often in a dictionary. I'm not sure if we should make SpeechRecognitionPhrase and SpeechRecognitionContext become dictionary instead, so that their relationship will be as simple as SpeechRecognitionContext contains a sequence of SpeechRecognitionPhrase. We want to perform data validation on each SpeechRecognitionPhrase so using dictionary may oversimplify things? Let me know what you think!

padenot · 2025-02-20T14:59:33Z

explainers/speech-recognition-context.md

+const recognition = new SpeechRecognition();
+recognition.start();
+var list = new SpeechRecognitionPhraseList();
+list.addItem(new SpeechRecognitionPhrase("updated text", 2.0));
+var context = new SpeechRecognitionContext(list);
+recognition.updateContext(context);


What do we lose by simply doing:

Suggested change

const recognition = new SpeechRecognition();

recognition.start();

var list = new SpeechRecognitionPhraseList();

list.addItem(new SpeechRecognitionPhrase("updated text", 2.0));

var context = new SpeechRecognitionContext(list);

recognition.updateContext(context);

const recognition = new SpeechRecognition();

recognition.start();

var list = [{text: "update text", weight: 2.0}];

recognition.updateContext(list);

The same interface simplification goes for the other example.

If you still want to simplify things like this, can you tell me how IDL will look like in this case? We also want to perform data validation on each SpeechRecognitionPhrase, so using new SpeechRecognitionPhrase() helps us throw an error as we validate there, otherwise we would need to hold on validation until updateContext() is called?

padenot · 2025-02-20T15:01:07Z

explainers/speech-recognition-context.md

+  if (event.error == "recognition-context-not-supported") {
+    console.log("Recognition context is not supported: ", event);
+  }
+};


Is this example leaving the actual use of the context that would lead to an error? I'm not sure if I follow.

I rewrote and added more sample codes to hopefully make it clear. Please take a look again and let me know if additional explanation is needed!

Add an explainer for the new speech recognition context feature

Explainer for speech recognition context is added in WebAudio#140

evanbliu requested a review from padenot February 19, 2025 23:36

padenot requested changes Feb 20, 2025

View reviewed changes

Create speech-recognition-context.md

0bfc5dc

Add an explainer for the new speech recognition context feature

yrw-google force-pushed the patch-1 branch from a9a8089 to 0bfc5dc Compare February 21, 2025 23:21

yrw-google added a commit to yrw-google/web-speech-api that referenced this pull request Feb 24, 2025

Add speech recognition context to the Web Speech API

62b1598

Explainer for speech recognition context is added in WebAudio#140

yrw-google mentioned this pull request Feb 24, 2025

Add speech recognition context to the Web Speech API #145

Open

yrw-google added a commit to yrw-google/web-speech-api that referenced this pull request Feb 24, 2025

Add speech recognition context to the Web Speech API

6ce27d9

Explainer for speech recognition context is added in WebAudio#140

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create speech-recognition-context.md #140

Create speech-recognition-context.md #140

yrw-google commented Feb 19, 2025

padenot left a comment

padenot Feb 20, 2025

yrw-google Feb 21, 2025

padenot Feb 20, 2025

yrw-google Feb 21, 2025

padenot Feb 20, 2025

yrw-google Feb 21, 2025 •

edited

Loading

padenot Feb 20, 2025

padenot Feb 20, 2025

yrw-google Feb 21, 2025

padenot Feb 20, 2025

yrw-google Feb 21, 2025



		### 2. Enhanced Relevance
		By incorporating contextual information, the speech recognition models can produce transcriptions that are more meaningful and aligned with the user's intent. This leads to better understanding of the spoken content and more accurate execution of voice commands.



		### 1. SpeechRecognitionPhrase
		This interface holds fundamental information about a biasing phrase, such as a text string and a boost value indicating how likely the phrase will appear.



		### 2. SpeechRecognitionPhraseList
		This interface holds a list of `SpeechRecognitionPhrase` and supports adding more `SpeechRecognitionPhrase` to the list.

Create speech-recognition-context.md #140

Are you sure you want to change the base?

Create speech-recognition-context.md #140

Conversation

yrw-google commented Feb 19, 2025

padenot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yrw-google Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yrw-google Feb 21, 2025 •

edited

Loading