Skip to content

Commit

Permalink
Merge branch 'release/v1.2.0' into 'master'
Browse files Browse the repository at this point in the history
Release/v1.2.0

See merge request speech-recognition-framework/esp-sr!24
  • Loading branch information
sun-xiangyu committed Mar 8, 2023
2 parents e2b9e0f + 8982693 commit 018ed41
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 87 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# Change log for esp-sr

## Unreleased


## 1.2.0
- ESP-DSP dependency is now installed from the component registry
- Add an English MultiNet6 model which is trained by RNNT and CTC
- Add a Chinese MultiNet6 model which is trained by RNNT and CTC
- Fixed CMake errors when esp-sr was installed from component registry
- Fixed the list of supported chips displayed in the component registry


## 1.1.0
- Support esp32c3 for Chinese TTS
- Update document of ESP-SR
Expand Down
63 changes: 14 additions & 49 deletions docs/en/speech_command_recognition/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,8 @@ Format of Speech Commands

Different MultiNets support different format:

- Chinese

MultiNet5 and MultiNet6 sse Pinyin for Chinese speech commands. Please use :project_file:`tool/multinet_pinyin.py` to get pinyin of Chinese.

- English

MultiNet5 use phonemes for English speech commands. Simplicity, we use chats to denote different phoneme.Please use :project_file:`tool/multinet_g2p.py` to do the convention.
MultiNet6 use grapheme for English speech commands. You do not need any convention.
- MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use :project_file:`tool/multinet_g2p.py` to do the convention.
- MultiNet6 use grapheme for English speech commands. You do not need any conversion.

Suggestions on Customizing Speech Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -63,45 +57,28 @@ When customizing speech command words, please pay attention to the following sug

.. list::

- The recommended length of Chinese speech commands is generally 4-6 Chinese characters. Too short leads to high false recognition rate and too long is inconvenient for users to remember
:esp32s3: - The recommended length of English speech commands is generally 4-6 words
- Mixed Chinese and English is not supported in command words
- The command word cannot contain Arabic numerals and special characters
- Avoid common command words like "hello"
- The greater the pronunciation difference of each Chinese character / word in the command words, the better the performance

Speech Commands Customization Methods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MultiNet6 customize speech commands:

- For English, words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format:
MultiNet6 customize speech commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format:

::

# command_id command_sentence
1 TELL ME A JOKE
2 MAKE A COFFEE

- For Chinese, pinyin are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_cn.txt` by the following format. :project_file:`tool/multinet_pinyin.py` help tp get Pinyin of Chinese.

::
MultiNet5 customize speech commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# command_id command_sentence
1 da kai kong tiao
2 guan bi kong tiao

Multinet5 supports flexible methods to customize speech commands. Users can do it either online or offline and can also add/delete/modify speech commands dynamically.

.. only:: latex

.. figure:: ../../_static/QR_multinet_g2p.png
:alt: menuconfig_add_speech_commands

Customize Speech Commands Offline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are two methods for users to customize speech commands offline:
There are two methods to customize speech commands offline:

- Via ``menuconfig``

Expand All @@ -114,7 +91,7 @@ There are two methods for users to customize speech commands offline:

Please note that a single ``Command ID`` can correspond to more than one commands. For example, "da kai kong tiao" and "kai kong tiao" have the same meaning. Therefore, users can assign the same command id to these two commands and separate them with "," (no space required before and after).

1. Call the following API:
2. Call the following API:

::

Expand All @@ -135,19 +112,12 @@ There are two methods for users to customize speech commands offline:

- Via modifying code

Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For details, see the example described in ESP-Skainet.

Customize speech commands online
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

MultiNet allows users to add/delete/modify speech commands dynamically during the operation, without the need to change models or modifying parameters. For details, see the example described in ESP-Skainet.

For detailed description of APIs, please refer to :project_file:`src/esp_mn_speech_commands.c` .
Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For detailed description of APIs. Please refer to :project_file:`src/esp_mn_speech_commands.c` and examples described in ESP-Skainet.

Use MultiNet
------------

MultiNet speech commands recognition must be used together with audio front-end (AFE) in ESP-SR (What's more, AFE must be used together with WakeNet). For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` .
We suggest to use MultiNet together with audio front-end (AFE) in ESP-SR. For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` .

After configuring AFE, users can follow the steps below to configure and run MultiNet.

Expand Down Expand Up @@ -187,11 +157,6 @@ Users can start MultiNet after enabling AFE and WakeNet, but must pay attention
MultiNet Output
~~~~~~~~~~~~~~~

Speech commands recognition supports two basic modes:

* Single recognition
* Continuous recognition

Speech command recognition must be used with WakeNet. After wake-up, MultiNet detection can start.

Afer running, MultiNet returns the recognition output of the current frame in real time ``mn_state``, which is currently divided into the following identification states:
Expand Down Expand Up @@ -228,13 +193,13 @@ Afer running, MultiNet returns the recognition output of the current frame in re

Users can use ``phrase_id[0]`` and ``prob[0]`` get the recognition result with the highest probability.

- ESP_MN_STATE_TIMEOUT
- ESP_MN_STATE_TIMEOUT

Indicates the speech commands has not been detected for a long time and will exit automatically and wait to be waked up again.

Therefore:
Single recognition mode and Continuous recognition mode:
* Single recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_DETECTED``
* Continuous recognition: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT``
* Continuous recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT``

Resource Occupancy
------------------
Expand Down
40 changes: 5 additions & 35 deletions docs/zh_CN/speech_command_recognition/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,8 @@ MultiNet 输入为经过前端语音算法(AFE)处理过的音频(格式

不同版本的MultiNet命令词格式不同。命令词需要满足特定的格式,具体如下:

- 中文

MultiNet5和MultiNet6使用汉语拼音作为基本识别单元,并且每个字的拼音拼写间隔一个空格。比如“打开空调”,应该写成 “da kai kong tiao”,请使用以下工具将汉字转为拼音: :project_file:`tool/multinet_pinyin.py` 。

- 英文

MultiNet5: 使用音标作为基本识别单元。为简单起见,将每个音标映射为单个字母表示,比如“turn on the light”,需要写成“TkN nN jc LiT”。请使用我们提供的工具进行转换,详细可见: :project_file:`tool/multinet_g2p.py` 。
MultiNet6: 使用subwords作为识别单元,用户可以直接输入所需短语。比如“turn on the light”,直接写为“turn on the light”即可。


自定义要求
~~~~~~~~~~~

Expand Down Expand Up @@ -96,17 +88,7 @@ MultiNet6 离线设置命令词的方法:
1 da kai kong tiao
2 guan bi kong tiao

- 英语通过修改 :project_file:`model/multinet_model/fst/commands_en.txt`

格式如下,第一个数字代表command id, 后面为指令的英语短语,两者由空格隔开,单词间也由空格隔开

::

# command_id command_sentence
1 TELL ME A JOKE
2 MAKE A COFFEE

MultiNet5 支持两种离线设置命令词的方法:
MultiNet5 离线设置命令词的方法:

- 通过 ``menuconfig``

Expand All @@ -119,7 +101,7 @@ MultiNet5 支持两种离线设置命令词的方法:

注意,单个 Command ID 可以支持多个短语,比如“打开空调”和“开空调”表示的意义相同,则可以将其写在同一个 Command ID 对应的词条中,用英文字符“,”隔开相邻词条(“,”前后无需空格)。

1. 在代码里调用以下 API:
2. 在代码里调用以下 API:

::

Expand All @@ -140,19 +122,12 @@ MultiNet5 支持两种离线设置命令词的方法:

- 通过修改代码

该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中,用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。详情可参考 ESP-Skainet 中的 example。

在线设置命令词
^^^^^^^^^^^^^^

MultiNet 还支持在运行过程中,在线动态设置命令词(添加/删除/修改),且整个过程无须更换模型或调整参数。详情可参考 ESP-Skainet 中 example。

具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c` 。
该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中,用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c` 和 ESP-Skainet 中的 example。

MultiNet 的使用
----------------

MultiNet 命令词识别需要和 ESP-SR 中的 AFE 声学算法模块一起运行(此外,AFE 运行还需要使能 WakeNet 功能,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。
MultiNet 命令词识别建议和 ESP-SR 中的 AFE 声学算法模块一起运行,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。

当用户配置完成 AFE 后,请按照以下步骤配置和运行 MultiNet。

Expand Down Expand Up @@ -192,11 +167,6 @@ MultiNet 运行
MultiNet 识别结果
~~~~~~~~~~~~~~~~~

MultiNet 命令词识别支持两种基本模式:

* 单次识别
* 连续识别

命令词识别必须和唤醒搭配使用,当唤醒后可以运行命令词的检测。

命令词模型在运行时,会实时返回当前帧的识别状态 ``mn_state`` ,目前分为以下几种识别状态:
Expand Down Expand Up @@ -237,7 +207,7 @@ MultiNet 命令词识别支持两种基本模式:

该状态表示长时间未检测到命令词,自动退出。等待下次唤醒。

因此
单次识别模式和连续识别模式
当命令词识别返回状态为 ``ESP_MN_STATE_DETECTED`` 时退出命令词识别,则为单次识别模式;
当命令词识别返回状态为 ``ESP_MN_STATE_TIMEOUT`` 时退出命令词识别,则为连续识别模式;

Expand Down
2 changes: 1 addition & 1 deletion idf_component.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
version: "1.1.0"
version: "1.2.0"
description: esp_sr provides basic algorithms for Speech Recognition applications
url: https://github.com/espressif/esp-sr
dependencies:
Expand Down
2 changes: 1 addition & 1 deletion tool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ For English, words are used as units. Please prepare a list of commands written
2 MAKE A COFFEE
```

For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help tp get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format:
For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help to get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format:
```
# command_id command_sentence
1 da kai kong tiao
Expand Down

0 comments on commit 018ed41

Please sign in to comment.