SpeechX Custom ASR

Jump to bottom

Hui Zhang edited this page Oct 9, 2022 · 8 revisions

在一些场景中，识别系统需要高精度的识别一些稀有词，例如导航软件中地名识别。而通过定制化识别可以满足这一需求。

相关demo: https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/custom_streaming_asr

相关脚本:https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx/examples/custom_asr

脚本的代码是一个详细的教程，用户可以根据自己的需求进行定制。

这个 demo 是打车报销单的场景识别，需要识别一些稀有的地名，可以通过如下操作实现。

G with slot: 打车到 "address_slot"。
这是 address slot wfst, 可以添加一些需要识别的地名.
通过 replace 操作, G = fstreplace(G_with_slot, address_slot), 最终可以得到定制化的解码图。

结果

demo的结果示例：

0513 10:58:13.827821 41768 recognizer_test_main.cc:56] wav len (sample): 70208
I0513 10:58:13.884493 41768 feature_cache.h:52] set finished
I0513 10:58:24.247171 41768 paddle_nnet.h:76] Tensor neml: 10240
I0513 10:58:24.247249 41768 paddle_nnet.h:76] Tensor neml: 10240
LOG ([5.5.544~2-f21d7]:main():decoder/recognizer_test_main.cc:90)  the result of case_10 is 五月十二日二十二点三十六分加班打车回家四十一元

demo 脚本的cer：

Overall -> 1.23 % N=1134 C=1126 S=6 D=2 I=6
Mandarin -> 1.24 % N=1132 C=1124 S=6 D=2 I=6
English -> 0.00 % N=2 C=2 S=0 D=0 I=0