From 3610431054f7c866ca981737c84b0ba1785b9c5e Mon Sep 17 00:00:00 2001 From: sxy Date: Mon, 6 Mar 2023 14:08:36 +0800 Subject: [PATCH 1/4] Release esp-sr v1.2.0 --- CHANGELOG.md | 4 +++- idf_component.yml | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index db91e23..209b445 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,13 +1,15 @@ # Change log for esp-sr ## Unreleased + + +## 1.2.0 - ESP-DSP dependency is now installed from the component registry - Add an English MultiNet6 model which is trained by RNNT and CTC - Add a Chinese MultiNet6 model which is trained by RNNT and CTC - Fixed CMake errors when esp-sr was installed from component registry - Fixed the list of supported chips displayed in the component registry - ## 1.1.0 - Support esp32c3 for Chinese TTS - Update document of ESP-SR diff --git a/idf_component.yml b/idf_component.yml index bf9b559..f8181bf 100644 --- a/idf_component.yml +++ b/idf_component.yml @@ -1,4 +1,4 @@ -version: "1.1.0" +version: "1.2.0" description: esp_sr provides basic algorithms for Speech Recognition applications url: https://github.com/espressif/esp-sr dependencies: From 33760ee57216b3792ca0e71248799a97c1ea6358 Mon Sep 17 00:00:00 2001 From: sxy Date: Tue, 7 Mar 2023 14:23:11 +0800 Subject: [PATCH 2/4] Fix some typos --- docs/en/speech_command_recognition/README.rst | 10 +++++----- tool/README.md | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/en/speech_command_recognition/README.rst b/docs/en/speech_command_recognition/README.rst index b815aaf..c132f7b 100644 --- a/docs/en/speech_command_recognition/README.rst +++ b/docs/en/speech_command_recognition/README.rst @@ -49,12 +49,12 @@ Different MultiNets support different format: - Chinese - MultiNet5 and MultiNet6 sse Pinyin for Chinese speech commands. Please use :project_file:`tool/multinet_pinyin.py` to get pinyin of Chinese. + MultiNet5 and MultiNet6 use Pinyin for Chinese speech commands. Please use :project_file:`tool/multinet_pinyin.py` to get pinyin of Chinese. - English - MultiNet5 use phonemes for English speech commands. Simplicity, we use chats to denote different phoneme.Please use :project_file:`tool/multinet_g2p.py` to do the convention. - MultiNet6 use grapheme for English speech commands. You do not need any convention. + MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use :project_file:`tool/multinet_g2p.py` to do the convention. + MultiNet6 use grapheme for English speech commands. You do not need any conversion. Suggestions on Customizing Speech Commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -83,7 +83,7 @@ MultiNet6 customize speech commands: 1 TELL ME A JOKE 2 MAKE A COFFEE -- For Chinese, pinyin are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_cn.txt` by the following format. :project_file:`tool/multinet_pinyin.py` help tp get Pinyin of Chinese. +- For Chinese, pinyin are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_cn.txt` by the following format. :project_file:`tool/multinet_pinyin.py` help to get Pinyin of Chinese. :: @@ -91,7 +91,7 @@ MultiNet6 customize speech commands: 1 da kai kong tiao 2 guan bi kong tiao -Multinet5 supports flexible methods to customize speech commands. Users can do it either online or offline and can also add/delete/modify speech commands dynamically. +Multinet5 supports flexible methods to customize speech commands. You can do it either online or offline and can also add/delete/modify speech commands dynamically. .. only:: latex diff --git a/tool/README.md b/tool/README.md index 46a8b3d..5f271c4 100644 --- a/tool/README.md +++ b/tool/README.md @@ -10,7 +10,7 @@ For English, words are used as units. Please prepare a list of commands written 2 MAKE A COFFEE ``` -For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help tp get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format: +For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help to get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format: ``` # command_id command_sentence 1 da kai kong tiao From 73ce64f6746dbd1042d505ecaab09f9fd0d6f7d0 Mon Sep 17 00:00:00 2001 From: sxy Date: Tue, 7 Mar 2023 15:14:15 +0800 Subject: [PATCH 3/4] Each language version only describes the corresponding language --- docs/en/speech_command_recognition/README.rst | 61 ++++--------------- .../speech_command_recognition/README.rst | 40 ++---------- 2 files changed, 18 insertions(+), 83 deletions(-) diff --git a/docs/en/speech_command_recognition/README.rst b/docs/en/speech_command_recognition/README.rst index c132f7b..1581f71 100644 --- a/docs/en/speech_command_recognition/README.rst +++ b/docs/en/speech_command_recognition/README.rst @@ -47,14 +47,8 @@ Format of Speech Commands Different MultiNets support different format: -- Chinese - - MultiNet5 and MultiNet6 use Pinyin for Chinese speech commands. Please use :project_file:`tool/multinet_pinyin.py` to get pinyin of Chinese. - -- English - - MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use :project_file:`tool/multinet_g2p.py` to do the convention. - MultiNet6 use grapheme for English speech commands. You do not need any conversion. + - MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use :project_file:`tool/multinet_g2p.py` to do the convention. + - MultiNet6 use grapheme for English speech commands. You do not need any conversion. Suggestions on Customizing Speech Commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -63,19 +57,16 @@ When customizing speech command words, please pay attention to the following sug .. list:: - - The recommended length of Chinese speech commands is generally 4-6 Chinese characters. Too short leads to high false recognition rate and too long is inconvenient for users to remember :esp32s3: - The recommended length of English speech commands is generally 4-6 words - Mixed Chinese and English is not supported in command words - The command word cannot contain Arabic numerals and special characters - - Avoid common command words like "hello" - - The greater the pronunciation difference of each Chinese character / word in the command words, the better the performance Speech Commands Customization Methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -MultiNet6 customize speech commands: - -- For English, words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format: +MultiNet6 customize speech commands +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +- Words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format: :: @@ -83,25 +74,11 @@ MultiNet6 customize speech commands: 1 TELL ME A JOKE 2 MAKE A COFFEE -- For Chinese, pinyin are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_cn.txt` by the following format. :project_file:`tool/multinet_pinyin.py` help to get Pinyin of Chinese. - - :: - - # command_id command_sentence - 1 da kai kong tiao - 2 guan bi kong tiao - -Multinet5 supports flexible methods to customize speech commands. You can do it either online or offline and can also add/delete/modify speech commands dynamically. -.. only:: latex - - .. figure:: ../../_static/QR_multinet_g2p.png - :alt: menuconfig_add_speech_commands - -Customize Speech Commands Offline +MultiNet5 customize speech commands ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -There are two methods for users to customize speech commands offline: +There are two methods to customize speech commands offline: - Via ``menuconfig`` @@ -114,7 +91,7 @@ There are two methods for users to customize speech commands offline: Please note that a single ``Command ID`` can correspond to more than one commands. For example, "da kai kong tiao" and "kai kong tiao" have the same meaning. Therefore, users can assign the same command id to these two commands and separate them with "," (no space required before and after). - 1. Call the following API: + 2. Call the following API: :: @@ -135,19 +112,12 @@ There are two methods for users to customize speech commands offline: - Via modifying code - Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For details, see the example described in ESP-Skainet. - -Customize speech commands online -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -MultiNet allows users to add/delete/modify speech commands dynamically during the operation, without the need to change models or modifying parameters. For details, see the example described in ESP-Skainet. - -For detailed description of APIs, please refer to :project_file:`src/esp_mn_speech_commands.c` . + Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For detailed description of APIs. Please refer to :project_file:`src/esp_mn_speech_commands.c` and examples described in ESP-Skainet. Use MultiNet ------------ -MultiNet speech commands recognition must be used together with audio front-end (AFE) in ESP-SR (What's more, AFE must be used together with WakeNet). For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` . +We suggest to use MultiNet together with audio front-end (AFE) in ESP-SR. For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` . After configuring AFE, users can follow the steps below to configure and run MultiNet. @@ -187,11 +157,6 @@ Users can start MultiNet after enabling AFE and WakeNet, but must pay attention MultiNet Output ~~~~~~~~~~~~~~~ -Speech commands recognition supports two basic modes: - - * Single recognition - * Continuous recognition - Speech command recognition must be used with WakeNet. After wake-up, MultiNet detection can start. Afer running, MultiNet returns the recognition output of the current frame in real time ``mn_state``, which is currently divided into the following identification states: @@ -228,13 +193,13 @@ Afer running, MultiNet returns the recognition output of the current frame in re Users can use ``phrase_id[0]`` and ``prob[0]`` get the recognition result with the highest probability. - - ESP_MN_STATE_TIMEOUT +- ESP_MN_STATE_TIMEOUT Indicates the speech commands has not been detected for a long time and will exit automatically and wait to be waked up again. -Therefore: +Single recognition mode and Continuous recognition mode: * Single recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_DETECTED`` -* Continuous recognition: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT`` +* Continuous recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT`` Resource Occupancy ------------------ diff --git a/docs/zh_CN/speech_command_recognition/README.rst b/docs/zh_CN/speech_command_recognition/README.rst index 1e07a71..d28a8df 100644 --- a/docs/zh_CN/speech_command_recognition/README.rst +++ b/docs/zh_CN/speech_command_recognition/README.rst @@ -47,16 +47,8 @@ MultiNet 输入为经过前端语音算法(AFE)处理过的音频(格式 不同版本的MultiNet命令词格式不同。命令词需要满足特定的格式,具体如下: -- 中文 - MultiNet5和MultiNet6使用汉语拼音作为基本识别单元,并且每个字的拼音拼写间隔一个空格。比如“打开空调”,应该写成 “da kai kong tiao”,请使用以下工具将汉字转为拼音: :project_file:`tool/multinet_pinyin.py` 。 -- 英文 - - MultiNet5: 使用音标作为基本识别单元。为简单起见,将每个音标映射为单个字母表示,比如“turn on the light”,需要写成“TkN nN jc LiT”。请使用我们提供的工具进行转换,详细可见: :project_file:`tool/multinet_g2p.py` 。 - MultiNet6: 使用subwords作为识别单元,用户可以直接输入所需短语。比如“turn on the light”,直接写为“turn on the light”即可。 - - 自定义要求 ~~~~~~~~~~~ @@ -96,17 +88,7 @@ MultiNet6 离线设置命令词的方法: 1 da kai kong tiao 2 guan bi kong tiao -- 英语通过修改 :project_file:`model/multinet_model/fst/commands_en.txt` - - 格式如下,第一个数字代表command id, 后面为指令的英语短语,两者由空格隔开,单词间也由空格隔开 - - :: - - # command_id command_sentence - 1 TELL ME A JOKE - 2 MAKE A COFFEE - -MultiNet5 支持两种离线设置命令词的方法: +MultiNet5 离线设置命令词的方法: - 通过 ``menuconfig`` @@ -119,7 +101,7 @@ MultiNet5 支持两种离线设置命令词的方法: 注意,单个 Command ID 可以支持多个短语,比如“打开空调”和“开空调”表示的意义相同,则可以将其写在同一个 Command ID 对应的词条中,用英文字符“,”隔开相邻词条(“,”前后无需空格)。 - 1. 在代码里调用以下 API: + 2. 在代码里调用以下 API: :: @@ -140,19 +122,12 @@ MultiNet5 支持两种离线设置命令词的方法: - 通过修改代码 - 该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中,用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。详情可参考 ESP-Skainet 中的 example。 - -在线设置命令词 -^^^^^^^^^^^^^^ - -MultiNet 还支持在运行过程中,在线动态设置命令词(添加/删除/修改),且整个过程无须更换模型或调整参数。详情可参考 ESP-Skainet 中 example。 - -具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c` 。 + 该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中,用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c` 和 ESP-Skainet 中的 example。 MultiNet 的使用 ---------------- -MultiNet 命令词识别需要和 ESP-SR 中的 AFE 声学算法模块一起运行(此外,AFE 运行还需要使能 WakeNet 功能,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。 +MultiNet 命令词识别建议和 ESP-SR 中的 AFE 声学算法模块一起运行,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。 当用户配置完成 AFE 后,请按照以下步骤配置和运行 MultiNet。 @@ -192,11 +167,6 @@ MultiNet 运行 MultiNet 识别结果 ~~~~~~~~~~~~~~~~~ -MultiNet 命令词识别支持两种基本模式: - -* 单次识别 -* 连续识别 - 命令词识别必须和唤醒搭配使用,当唤醒后可以运行命令词的检测。 命令词模型在运行时,会实时返回当前帧的识别状态 ``mn_state`` ,目前分为以下几种识别状态: @@ -237,7 +207,7 @@ MultiNet 命令词识别支持两种基本模式: 该状态表示长时间未检测到命令词,自动退出。等待下次唤醒。 -因此: +单次识别模式和连续识别模式: 当命令词识别返回状态为 ``ESP_MN_STATE_DETECTED`` 时退出命令词识别,则为单次识别模式; 当命令词识别返回状态为 ``ESP_MN_STATE_TIMEOUT`` 时退出命令词识别,则为连续识别模式; From e786bcbe562360c429742fd976aa95cf5cac974b Mon Sep 17 00:00:00 2001 From: sxy Date: Tue, 7 Mar 2023 20:10:30 +0800 Subject: [PATCH 4/4] fix the warning:title underline too short --- docs/en/speech_command_recognition/README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/speech_command_recognition/README.rst b/docs/en/speech_command_recognition/README.rst index 1581f71..071432b 100644 --- a/docs/en/speech_command_recognition/README.rst +++ b/docs/en/speech_command_recognition/README.rst @@ -76,7 +76,7 @@ MultiNet6 customize speech commands MultiNet5 customize speech commands -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two methods to customize speech commands offline: