Releases: PaddlePaddle/PaddleSpeech
PaddleSpeech r1.4.2
S2T
- Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech. #3242 by @jiamingkong
- Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech. #3088 by @Zth9730
- Add Squeezeformer model. #2755 by @yeyupiaoling
- Add AMP for U2 conformer. #3167 by @zxcd
- Mv dataset into paddlespeech.dataset. #3183 #3189 by @zh794390558
- Fix example/aishell local/train.sh if condition bug. #3146 by @lemondy
- Fix cli args to config. #3194 by @zh794390558
- Fix scaler save, load, unscale_ blow, grad_clip. by @zxcd
T2S
- Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including DiffSinger、PWGAN (#3031 by @lym0302) and HiFiGAN (#3038 by @lym0302), the effect is continuously optimized.
- Add SVS frontend. #3062 by @lym0302
- Add TTS iSTFTNet (#3006 by @longRookie), TTS JETS (#3109 by @ljhzxc)
- Starganv2: by @yt605155624
- Support for LITE: by @yt605155624
- Add XPU support for SpeedySpeech and FastSpeech2. #3502 #3514 by @USTCKAY
- Fix some preprocess bugs. #3155 by @yt605155624
- Fix bug of merge_yi function. #3786 by @mattheliu
Server
- Add code-switch conformer_talcs support. #3230 by @Gsonovb
- Add subtitle file (.srt format) generation example. #3123 by @twoDogy
- Fix: add file read encoding. #3606 by @Coloryr
Install & Benchmark - Update paddle2onnx to newest install version. #3084 by @yt605155624
- Update to py3.8, fix librosa==0.8.1 numpy==1.23.5 for paddleaudio. by @zh794390558
- Fix transformation import error. #3779 by @kk-2000
- Adapt view behavior change, fix KeyError. #3794 by @zxcd
- Fix profiler, fix gpu_mem unit, add max_mem_reserved for benchmark. #3323 #3634 #3604 by @mmglove
Docs
- Fix some typos. #3178 by @Yulv-git
- Update svs_music_score.md. #3085 #3070 by @lym0302
- Update quick_start.md. #3175 #3176 by @46319943
- Add cli test readme. #3784 by @zxcd
- Update bug-report-tts.md. #3120 by @yt605155624
Others
- Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved. #3214 by zxcd #3334 by @zh794390558
- Add dtype param for arange API. #3302 by @zxcd
- Fix develop bug function:view to reshape. #3633 by @luyao-cv
- Fix progress bar unit. #3177 by @46319943
- Rm unused dep. #3097 by @lym0302
Acknowledgements
Special thanks to @jiamingkong @Zth9730 @yeyupiaoling @zxcd @zh794390558 @lemondy @lym0302 @longRookie @ljhzxc @yt605155624 @USTCKAY @mattheliu @Gsonovb @twoDogy @Coloryr @kk-2000 @mmglove @Yulv-git @46319943 @luyao-cv
New Contributors
- @jiamingkong made their first contribution in #3242
- @yeyupiaoling made their first contribution in #2755
- @lemondy made their first contribution in #3146
- @longRookie made their first contribution in #3006
- @ljhzxc made their first contribution in #3109
- @USTCKAY made their first contribution in #3502
- @mattheliu made their first contribution in #3786
- @Gsonovb made their first contribution in #3230
- @twoDogy made their first contribution in #3123
- @Coloryr made their first contribution in #3606
- @kk-2000 made their first contribution in #3779
- @Yulv-git made their first contribution in #3178
- @46319943 made their first contribution in #3175
- @luyao-cv made their first contribution in #3633
Full Changelog: r1.4.1...r1.4.2
PaddleSpeech r1.4.1
Others
- fix typeguard version. #3056 @yt605155624
PaddleSpeech r1.4.0
S2T
- Add wav2vec2-zh finetune pipeline. #3012 #2916 by @zxcd
- Fix some bugs in Whisper. #2900 #2825 by @zxcd
- Add code-switch asr tal_cs recipe. #2816 #2796 by @zxcd
T2S
- Add dygraph to static、PaddleInference、Paddle2ONNX and ONNXRuntime Infer for Cantonese TTS. #2990 by @JiehangXie
- Add Cantonese test examples. #2937 by @JiehangXie
- Add VITS inference pipeline. #3002 #2972 #2883 by @yt605155624
- Rearrange encoder_infer param's order. #2983 by @443127316
- Add male speaker and Chinese-English mix ONNXRuntime infer in CLI. #2945 by @lym0302
- Add Cantonese TTS example. #2950 #2927 #2924 #2907 #2899 by @WongLaw
- Fix PWGAN TIPC. #2882 by @yt605155624
- Add a case in not_erhua. #2863 by @QuanZ9
- Fix data prepare for PaddleSlim PTQ of TTS. #2862 by @yt605155624
- Avoid using variable "attn_loss" before assignment. #2860 by @hopingZ
- add soft link for shell in example, Add skip_copy_wave in norm stage of GANVocoders to save disk. #2851 by @yt605155624
- Optimize the training of VITS. #2843 #2809 #2791 #2770 by @WongLaw
- Add StarGANv2-VC model scripts and synthsize scripts. #2842 by @yt605155624
- Add diffusion module for training diffsinger. #2868 #2832 by @HighCWu
- Fix some Text Frontend bugs. #2831 by @yt605155624
- For mixed Chinese and English speech synthesis, add SSML support for Chinese. #2830 by @jindongyi011039
- Add mkldnn and trt config for TTS Inference. #2748 by @yt605155624
- Fix dygraph to static for tacotron2. #2426 by @yt605155624
Server
Engine
- Add wfst decoder. #2886 by @SmileGoat
- Add batch recognizer decode. #2866 by @SmileGoat
- Add nnet prob cache && make 2 thread decode work. #2769 by @SmileGoat
- Engine directory refactor. #2746 by @SmileGoat
- Fix openfst download error. #2742 by @SmileGoat
Audio
- Replace kaldi fbank with kaldi-native-fbank in paddleaudio. #2799 by @SmileGoat
- Fix load paddleaudio fail. #2815 by @SmileGoat
- Update paddleaudio readme. #2801 by @SmileGoat
Demos
- Add TTS ARM Linux C++ Demo. #2991 by @SwimmingTiger
- Add Cantonese TTS in CLI. #2977 by @WongLaw
- Add ONNXRuntime infer for Cantonese TTS in CLI. #2990 by @JiehangXie
Docs
- Add u2pp_wenetspeech_static_quant to released_model.md. #2973 @zxcd
- Remove redundant dependencies and Fix some bugs in setup.py. #2970 #2871 #2867 #2853 #2771 #2767 #2764 by @yt605155624
Others
- Remove fluid API in ASR. #2944 #2859 #2852 by @zxcd
- Add python simple adadelta optimizer. #2925 by @zxcd
- Add encoding=utf-8 for text. #2896 by @zxcd #2865 by @yt605155624
- Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. #2884 by @zhouwei25
- Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. #2763 by @linkec
- Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. #2745 by @GreatV
New Contributors
- @GreatV made their first contribution in #2745
- @linkec made their first contribution in #2763
- @cxumol made their first contribution in #2828
- @jindongyi011039 made their first contribution in #2830
- @QuanZ9 made their first contribution in #2863
- @hopingZ made their first contribution in #2860
- @zhouwei25 made their first contribution in #2884
- @EscaticZheng made their first contribution in #2915
- @chinobing made their first contribution in #2922
- @lance6716 made their first contribution in #2924
- @443127316 made their first contribution in #2983
Full Changelog: r1.3.0...r1.4.0
PaddleSpeech r1.3.0
HighLIght
S2T
- Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
- Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
- Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
- Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
- Add whisper. #2640 #2704 by @zxcd
- Fix gpu training hang. #2478 by @Zth9730
- Support u2++ based cli and server. #2489 #2510 by @Zth9730
- Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
- Add wav2vec2-zh cli. #2697 by @Zth9730
T2S
- Add seek for BytesIO. #2484 by @ZapBird
- Add mix finetune. #2525 #2647 by @lym0302
- Add streaming TTS fastdeploy serving. #2528 by @HexToString
- Add SSML for Chinese Text Frontend. #2531 by @david-95
- Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model). #2548 #2615 #2693 by @WongLaw
- Add Adversarial Loss for Chinese English mixed TTS. #2588 by @lym0302
- Fix frontend bugs. #2539 #2606 by @yt605155624
- Add TN for English unit. #2629 by @WongLaw
- Add male voice for TTS. #2660 by @lym0302
- Add double byte char for zh normalization. #2661 by @david-95
- Add TTS Paddle-Lite x86 inference. #2636 #2667 by @yt605155624
- Add greek char and fix #2571. #2683 by @david-95
- Add Slim for TTS. #2729 by @yt605155624
Audio
- Move paddlespeech/audio to paddleaudio. #2706 by @SmileGoat
Demo
- Add TTSAndroid demo. #2703 by @yt605155624
New Contributors
- @ZapBird made their first contribution in #2484
- @HexToString made their first contribution in #2528
- @dahu1 made their first contribution in #2554
- @kFoodie made their first contribution in #2664
- @zxcd made their first contribution in #2640
- @michael-skynorth made their first contribution in #2666
- @heyudage made their first contribution in #2688
Full Changelog: r1.2.0...r1.3.0
PaddleSpeech r1.2.0
S2T
- Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
- Fix deepspeech2 decode_wav. #2351 by @Zth9730
- Support BiTransformer decoder. #2415 by @Zth9730
T2S
- Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
- Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
- Specify the input data type of G2PW. #2288 by @kslz
- Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
- Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
- Add words into polyphonic.yaml for g2pW. #2300 by @david-95
- Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
- Fix Chinese frontend bugs. #2312 #2323 by @david-95
- Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
- Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
- Add tools to compare two test results of G2P to show differences. #2367 by @david-95
- Revise must_neural_tone_words. #2370 by @WongLaw
- Add type-hint for g2pW. #2390 by @yt605155624
- Replaced fixed path with path variable in MFA. #2416 by @WongLaw
- Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015
Text
Demo
Server
- Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
- Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw
Doc
- Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
- Update API docs. #2406 by @yt605155624
- Add finetune demos in readthedocs. #2411 by @yt605155624
Test
- Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
- Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo
Other
- Format paddlespeech with pre-commit. #2331 by @yt605155624
Acknowledgements
Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat
New Contributors
- @HighCWu made their first contribution in #2268
- @pengzhendong made their first contribution in #2308
- @Zth9730 made their first contribution in #2327
- @WongLaw made their first contribution in #2357
- @yuehuayingxueluo made their first contribution in #2376
- @zhoupc2015 made their first contribution in #2422
Full Changelog: r1.1.0...r1.2.0
PaddleSpeech r1.1.0
S2T
- Add wer tools. #1709
- Add optimize attention cache used for attention ; 0-dim tensor for model export. #2124
- Fix cnn cache dy2st shape. #2168
TTS
- Fix random speaker embedding bug in voice clone. #1828 by @jerryuhoo
- Add VITS model. #1855 #1957 #2040
- Add kunlun support for speedyspeech. #1879 by @QingshuChen
- Normalize wav max value to 1 in preprocess. #1887 by @jerryuhoo
- Remove fluid dependence in TTS. #1940
- Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. #2068
- Add TTS static/onnx models in pretrained_models.py. #2074
- Add Ernie SAT model. #2052 #2117
- Add Chinese English mixed TTS frontend. #2143
- Add Chinese English mixed TTS example. #2234
- Fix English text frontend bug. #2235 by @david-95
- Add g2pW to Chinese frontend. #2230 by @BarryKCL
- Fix text frontend bugs. #1912 #2250 #2254 #2255 #2272
Speechx
- add custom asr script. #1946
- refactor frontend. #2003
- deepspeech2 to onnx #2034
- Refactor audio/data/feature cache. #1638
- Frontend refactor . #1640
- Fix nnet itf header. #1641
- Refactor speech egs. #1707
- Refactor egs and more egs for TLG wfst graph build. #1715
- Speedup ngram building . #1729
- Update speechx install doc. #1736
- Fix nnet input and output name. #1740
- Update wfst graph. #1742
- Fix model params path name. #1750
- Remove fluid tools for onnx export. #2116
Audio
Server
- Remove extra logs. #2111 #2113
- Change streaming tts servers' fs from 24k to models' fs. #2121
- Fix bug in engine_warmup. #2171 by @Betterman-qs
- Replace default vocoder in seerver to mb_melgan. #2214
- Fix bug in streaming_asr_server with punctuation restoration. #2244
- Rename time_s and time_ns to time_b and time_nb. #2133
- More accuracy decoding somthing. #2128
CLI
- Add paddlespeech.resource module. #1917
- Dynamic cli commands registration. #1959
- Fix unnecessary download. #2103
- Remove extra logs. #2084 #2085 #2107
- Add Chinese English mixed TTS CLI. #2249
- Add onnxruntime infer for CLI. #2222
Demo
- Add speech web demo. #2039 #2080
- Add kws cli and demo. #2063
- Use paddle web for streaming asr. #2105
- add custom asr script #1946
- More cli for speech demos. #2138
Doc
Others
- Fix CPU Dockerfile. #2172 by @BrightXiaoHan
- Add PaddleSpeech Dockerfile for hard mode of installation. #2127 by @buchongyu2
Acknowledgements
Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624
New Contributors
- @QingshuChen made their first contribution in #1879
- @Zhangjingyu06 made their first contribution in #1951
- @ryanrussell made their first contribution in #1976
- @freeliuzc made their first contribution in #2044
- @vpegasus made their first contribution in #2043
- @dependabot made their first contribution in #2061
- @raycool made their first contribution in #2109
- @YDX-2147483647 made their first contribution in #2125
- @chenkui164 made their first contribution in #2130
- @0x45f made their first contribution in #2162
- @Doubledongli made their first contribution in #2167
- @Betterman-qs made their first contribution in #2171
- @BrightXiaoHan made their first contribution in #2172
- @THUzyt21 made their first contribution in #2202
- @david-95 made their first contribution in #2235
- @BarryKCL made their first contribution in #2230
Full Changelog: r1.0.0...r1.1.0
PaddleSpeech r1.0.0
Highlight
- Release PP-ASR: Streaming ASR with timestamp and punctuation restoration, uses WenetSpeech Streaming Conformer and DeepSpeech2 ASR model.
- Release PP-TTS: Streaming TTS system for industrial application.
- Release PP-VPR: Industrial Voiceprint Recognition system and ECAPA-TDNN model.
- Custom ASR apply reimbursement for transportation
- Support MDTC KWS model
More
ASR
- DeepSpeech2 streaming model aishell cer 6.66%
- DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
- Conformer aishell cer 4.64%
- Conformer streaming model aishell cer 5.44%
- Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)
Speechx
- [SpeechX] DeepSpeech2 streaming with WFST in streaming asr example
- [SpeechX] Add websocket websocket example
- [SpeechX] custom asr, apply reimbursement for transportation demo
KWS
- [KWS] Add kws example on HeySnips dataset. by @KPatr1ck in #1558
- [KWS] Update KWS example. by @KPatr1ck in #1783
Audio
- [Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
- [Audio] Fix mcd issue. by @KPatr1ck in #1658
- [Audio] Remove mcd. by @KPatr1ck in #1659
- [Audio] Add
VoxCeleb
dataset for speaker recognition. - [Audio] Add
HeySnips
dataset for keyword spotting.
What's Changed
- [R1.0][asr][server]add vector server by @honei in #1845
- [R1.0][asr][server]join streaming asr and punc server by @honei in #1846
- [R1.0]asr streaming server add time stamp by @honei in #1850
- [R1.0][tts][server] update readme by @lym0302 in #1852
- [R1.0] update cli by @Jackwaterveg in #1854
- [r1.0] update version to r1.0.0 by @zh794390558 in #1857
- [R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in #1862
- [R1.0][server] improve server code by @lym0302 in #1866
- [R1.0][asr][server]update the streaming asr readme by @honei in #1871
- [R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in #1869
- [R1.0]fix server doc and decode_method by @Jackwaterveg in #1889
- [speechx] add custom_streaming_asr @SmileGoat #1891
- [speechx] speedup ngram building @zh794390558 #1729
- [speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715
- [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676
- [speechx] Add websocket & make it work @SmileGoat #1720
- [speechx] Frontend refactor @SmileGoat #1640
- [Speechx] add tlg decoder @SmileGoat #1599
Full Changelog: r1.0.0a...r1.0.0
PaddleSpeech r1.0.0a
Highlight
- Release Streaming ASR and Streaming TTS system for industrial application.
- Support KWS model
- Deepspeech2 streaming model aishell cer 6.66%
- Conformer aishell cer 4.64%
- Conformer streaming model aishell cer 5.44%
- SpeechX Deepspeech2 streaming with WFST
What's Changed
- [speechx] refactor audio/data/feature cache by @zh794390558 in #1638
- [speechx] Frontend refactor by @zh794390558 in #1640
- [speechx] fix nnet itf header by @zh794390558 in #1641
- [TTS]add license and reference for some models by @yt605155624 in #1642
- [Doc] supplement note by @Jackwaterveg in #1643
- [vec][search] update search demo README by @qingen in #1644
- [speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in #1649
- [Audio] Fix mcd issue. by @KPatr1ck in #1658
- [Audio] Remove mcd. by @KPatr1ck in #1659
- [vec]update the speaker verification model by @honei in #1663
- [ASR] update ds2 online model by @Jackwaterveg in #1668
- [TTS]fix preprocess bug, test=tts by @yt605155624 in #1660
- update README, test=doc by @iftaken in #1672
- [Punc] Update RESULTS.md. by @KPatr1ck in #1675
- [CLI] update ds2 online model in cli by @Jackwaterveg in #1674
- [CLI] ASR: Add duration limitation for asr by @Jackwaterveg in #1666
- [vec]add speaker verification score method by @honei in #1646
- [TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in #1665
- [doc]update readme by @yt605155624 in #1680
- [WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in #1682
- [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in #1676
- [server] add stream tts server by @lym0302 in #1652
- [speechx]remove mutable in audio_cache by @SmileGoat in #1687
- [Doc] update readem for aishell/asr0 by @Jackwaterveg in #1677
- [vec] add speaker diarization pipeline by @ccrrong in #1651
- [vec]voxceleb convert dataset format to paddlespeech by @honei in #1630
- [Speechx] add tlg decoder by @SmileGoat in #1599
- [vec]add vector necessary note, test=doc by @honei in #1690
- Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in #1691
- [WebSocket] added online web client, test=doc by @WilliamZhang06 in #1692
- 修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in #1694
- 修改hack 单词拼写错误 by @buchongyu2 in #1697
- [TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in #1693
- [doc]fix typo, test=doc by @yt605155624 in #1698
- [doc]add pwgan onnx model, test=doc by @yt605155624 in #1700
- [WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in #1701
- [vec][server] vpr demo support by @qingen in #1696
- [speechx] refactor speech egs by @zh794390558 in #1707
- [asr]add wer tools by @zh794390558 in #1709
- [asr][websocket]fix the ws send bug, cache buffer, text=doc by @honei in #1710
- [TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in #1712
- [speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in #1715
- [vec][score] add plda model by @qingen in #1681
- [CLI]update cli, test=doc by @yt605155624 in #1716
- [server] add streaming am infer by @lym0302 in #1713
- [speechx] Add websocket & make it work by @SmileGoat in #1720
- [asr][websocket] add asr conformer websocket server by @honei in #1704
- [vec][loss] add NCE Loss from RNNLM by @qingen in #1719
- [vec][loss] add FocalLoss to deal with class imbalances by @qingen in #1722
- [TTS]restructure syn_utils.py, test=tts by @yt605155624 in #1723
- [TTS]add paddle device set for ort and inference by @yt605155624 in #1727
- [vec] add GRL to domain adaptation by @qingen in #1725
- [speechx] speedup ngram building by @zh794390558 in #1729
- [asr] Add new cer tools by @Jackwaterveg in #1673
- [speechx]add websocket lib by @SmileGoat in #1732
- [speechx]update speechx install doc by @zh794390558 in #1736
- [Doc] prefect the packing scripts by @Jackwaterveg in #1735
- [Doc]renew the released mode by @Jackwaterveg in #1739
- [asr][websocket]add streaming asr demo by @honei in #1737
- [speechx] fix nnet input and output name by @zh794390558 in #1740
- [ASR] remove redundant log by @Jackwaterveg in #1741
- [speechx] update wfst graph by @zh794390558 in #1742
- [speechx] Add recognizer_test_main script by @SmileGoat in #1743
- [vec][doc]update the voxceleb readme.md, test=doc by @honei in #1744
- [ASR] fix CER tools by @Jackwaterveg in #1747
- [Doc] Fix release_model info by @Jackwaterveg in #1746
- [Doc] Updata released model info by @Jackwaterveg in #1748
- Updata released model info by @Jackwaterveg in #1749
- [speechx] fix model params path name by @zh794390558 in #1750
- [speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in #1751
- [TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in #1752
- [server] add onnx tts engine by @lym0302 in #1733
- [TTS]Update paddle2onnx by @yt605155624 in #1754
- [Setup] to r1.0.0a by @Jackwaterveg in #1759
- [audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
- [speechx] to_float32, fix shell script by @zh794390558 in #1757
- [vec] bug fix to adapt VUE by @qingen in #1760
- [asr][weboscket]fix the streaming asr server bug, server client by @honei in #1761
- [speechx] fbank and mfcc by @zh794390558 in #1765
- format code by @zh794390558 in #1764
- [CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in #1767
- [speechx]make cmvn global in run.sh by @SmileGoat in #1768
- [ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in #1766
- [speechx] set nnet param by flags by @zh794390558 in #1769
- [server] add streaming tts demos by @lym0302...
PaddleSpeech r0.2.0
S2T
- Replace kaidi_fbank with paddleaudio #1612
- Support CTC decoder online #821 #1626
- Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577
TTS
- Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
- Add WaveRNN for CSMSC dataset. #1379
- Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
- Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
- Update text frontend. #1506
- Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
- Add NPU support for TransformerTTS. #1593 by @windstamp
- Add CNN Decoder for Streaming Fastspeech2. #1634
Audio
- Add
paddleaudio.compliance
modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518 - Unittest and benchmark for audio feature APIs. #1548
- [Audio] - [audio] refactor audio arch #1494 by @zh794390558
- [Audio] - [audio] dtw metric #1493 by @zh794390558
- [Audio] - [audio] fix complicance bug #1597 by @zh794390558
Deployment
- [Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
- [Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
- [Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
- [Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558
server
- [server] - [websocket] added online asr engine #1627 by @WilliamZhang06
- [server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
- [server] - [Server] added asr engine #1413 by @WilliamZhang06
- [server] - [Server] added engine factory and config #1399 by @WilliamZhang06
- [server] - [server] added engine framework #1383 by @WilliamZhang06
- [server] - [server] update readme #1604 by @lym0302
- [server] - [server] add server cls #1554 by @lym0302
- [server] - [server] add paddlespeech_server stats #1510 by @lym0302
- [server] - [server] add cli #1466 by @lym0302
- [server] - [server] add tts postprocess #1411 by @lym0302
- [server] - [server] tts server #1386 by @lym0302
vector
CLI
- Batch input supported. #1460
- TTS: Add WaveRNN for CSMSC dataset.
- TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
- Vector: add speaker verification demo and doc #1605 by @honei
Demo
- [Demo] - [vec][search] update client image url #1628 by @qingen
- [Demo] - [server] add server demo #1480 by @lym0302
- [Demo] - [vec][search] add audio similarity search #1609 by @qingen
Acknowledgements
Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen
PaddleSpeech r0.1.2
Bug Fix:
- FIxed the version of librosa==0.8.1. Solve the compatibility issue caused by librosa upgrading. #1426