Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用 conda 克隆之后,运行时报错 #25

Open
EricHeyYa opened this issue Dec 19, 2024 · 10 comments
Open

使用 conda 克隆之后,运行时报错 #25

EricHeyYa opened this issue Dec 19, 2024 · 10 comments

Comments

@EricHeyYa
Copy link

(chatttsplus) PS E:\chatttsplus> python webui.py --cfg configs/infer/chattts_plus.yaml INFO:ChatTTSPlusPipeline:device: cpu INFO:ChatTTSPlusPipeline:dtype: torch.float32 INFO:ChatTTSPlusPipeline:DVAE coef: 榧澓趀漷嵂偡竼绣砏篱揼跆病懻炿脾曟彳豨蚺嶵莂仿琣像袄嵘譻稓燷澉脿禭瓳豎师巬坰滻詣萏琍斀謾巿懫垳蔾樑勣蒁嘃差琜諸纊罏帄璄讴訿几箩栿芰緓虻補嶿瑓盼蚯艏穫崀賚垄至嶙蘿薞促豆戤巁漉滻牮扏荢貼費晗凸柵嬿裗蓣跩慄巑呧盾亗膏澣猠趠幝臞紵嬿磖徣誠畠巤媥苽曳垏玌榬賯芏臽炅砾稄嵓蘳菥嶙傛勸些喏屓忸赜封凸份堾礒譳虷幐巕嬠盼耐歏豷垀贬褄懪垣栾碵悃豺匒巆區竿廇瞏矈耬賖炮燪聬茾稑諳虬买巬紙諸嫤憏澅榈讀種凛賤萿媟董谇噐巼孥拺皛勏狋毄貒揬凮瓽猿姼亣豂艷巘塯勲抁粏疤滨跅幞燸仓蜽粺喳貔砸巀㴁 INFO:ChatTTSPlusPipeline:loading model: tokenizer >>>> INFO:ChatTTSPlusPipeline:{'name': 'Tokenizer', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/tokenizer.pt'}} INFO:Tokenizer:loading Tokenizer pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/tokenizer.pt E:\chatttsplus\chattts_plus\models\tokenizer.py:27: FutureWarning: You are using torch.loadwithweights_only=False(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_onlywill be flipped toTrue. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
tokenizer: BertTokenizerFast = torch.load(
INFO:ChatTTSPlusPipeline:loading model: dvae_encode >>>>
INFO:ChatTTSPlusPipeline:{'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/DVAE_full.pt', 'dim': 512, 'decoder_config': {'idim': 512, 'odim': 512, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'encoder_config': {'idim': 512, 'odim': 1024, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'vq_config': {'dim': 1024, 'levels': [5, 5, 5, 5], 'G': 2, 'R': 2}, 'coef': '榧澓趀漷嵂偡竼绣砏篱揼跆病懻炿脾曟彳豨蚺嶵莂仿琣像袄嵘譻稓燷澉脿禭瓳豎师巬坰滻詣萏琍斀謾巿懫垳蔾樑勣蒁嘃差琜諸纊罏帄璄讴訿几箩栿芰緓虻補嶿瑓盼蚯艏穫崀賚垄至嶙蘿薞促豆戤巁漉滻牮扏荢貼費晗凸柵嬿裗蓣跩慄巑呧盾亗膏澣猠趠幝臞紵嬿磖徣誠畠巤媥苽曳垏玌榬賯芏臽炅砾稄嵓蘳菥嶙傛勸些喏屓忸赜封凸份堾礒譳虷幐巕嬠盼耐歏豷垀贬褄懪垣栾碵悃豺匒巆區竿廇瞏矈耬賖炮燪聬茾稑諳虬买巬紙諸嫤憏澅榈讀種凛賤萿媟董谇噐巼孥拺皛勏狋毄貒揬凮瓽猿姼亣豂艷巘塯勲抁粏疤滨跅幞燸仓蜽粺喳貔砸巀㴁'}}
INFO:DVAE:loading DVAE pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/DVAE_full.pt
INFO:ChatTTSPlusPipeline:loading model: dvae_decode >>>>
INFO:ChatTTSPlusPipeline:{'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/Decoder.pt', 'dim': 384, 'decoder_config': {'idim': 384, 'odim': 384, 'hidden': 512, 'n_layer': 12, 'bn_dim': 128}, 'coef': '榧澓趀漷嵂偡竼绣砏篱揼跆病懻炿脾曟彳豨蚺嶵莂仿琣像袄嵘譻稓燷澉脿禭瓳豎师巬坰滻詣萏琍斀謾巿懫垳蔾樑勣蒁嘃差琜諸纊罏帄璄讴訿几箩栿芰緓虻補嶿瑓盼蚯艏穫崀賚垄至嶙蘿薞促豆戤巁漉滻牮扏荢貼費晗凸柵嬿裗蓣跩慄巑呧盾亗膏澣猠趠幝臞紵嬿磖徣誠畠巤媥苽曳垏玌榬賯芏臽炅砾稄嵓蘳菥嶙傛勸些喏屓忸赜封凸份堾礒譳虷幐巕嬠盼耐歏豷垀贬褄懪垣栾碵悃豺匒巆區竿廇瞏矈耬賖炮燪聬茾稑諳虬买巬紙諸嫤憏澅榈讀種凛賤萿媟董谇噐巼孥拺皛勏狋毄貒揬凮瓽猿姼亣豂艷巘塯勲抁粏疤滨跅幞燸仓蜽粺喳貔砸巀㴁'}}
INFO:DVAE:loading DVAE pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/Decoder.pt
INFO:ChatTTSPlusPipeline:loading model: vocos >>>>
INFO:ChatTTSPlusPipeline:{'name': 'Vocos', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/Vocos.pt', 'feature_extractor_config': {'sample_rate': 24000, 'n_fft': 1024, 'hop_length': 256, 'n_mels': 100, 'padding': 'center'}, 'backbone_config': {'input_channels': 100, 'dim': 512, 'intermediate_dim': 1536, 'num_layers': 8}, 'head_config': {'dim': 512, 'n_fft': 1024, 'hop_length': 256, 'padding': 'center'}}}
INFO:ChatTTSPlusPipeline:loading model: gpt >>>>
INFO:ChatTTSPlusPipeline:{'name': 'GPT', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/GPT.pt', 'gpt_config': {'hidden_size': 768, 'intermediate_size': 3072, 'num_attention_heads': 12, 'num_hidden_layers': 20, 'use_cache': False, 'max_position_embeddings': 4096, 'spk_emb_dim': 192, 'spk_KL': False, 'num_audio_tokens': 626, 'num_vq': 4}}}
INFO:GPT:loading GPT pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/GPT.pt
INFO:ChatTTSPlusPipeline:loading speaker stat: E:\chatttsplus\chattts_plus\checkpoints\asset/spk_stat.pt
INFO:ChatTTSPlusPipeline:loading normalizer: E:\chatttsplus\chattts_plus\checkpoints\homophones_map.json
INFO: Could not find files for the given pattern(s).

To create a public link, set share=True in launch().
INFO:ChatTTSPlusPipeline:speaker_emb is None, random select a speaker!
INFO:ChatTTSPlusPipeline:speaker embedding is : 蘁淰敥欀櫌凖絘螜瞑掉孖捰槗琜蓻患瑈窲妧柔唣誧螺蚆莫娻簠丅瓞冥戗豔浹跪嬁昜维挖仐弄弶螓乺羠笷啣筸垎艚捰瞅礲挒识碚袀熹噜膗祒擂淥虀葌舮磹嚙佃泲缭塕呸蚳経娞虁皙匄蠜峹碿紻肅塱蕻噑俌悍蔩宛墌巣綫泞榞縬堫掝舞呼是蝲睗庫淺劦澓砷幠场睂蒅悞从扂咪兾敘媢利菩剻譵笇唦茞蓨晤罥敃瓿毐榄作検壏蓇抁檓硶泞裲悑缷牸藯螚繊帽芝搢塈壿抶塾脳虎旙崓腢変犧某莯賠杸寳趩悕勶入啷艛刈煾菑谝典斔繼蘱嘊堮焂蝍胗簀尻桭詋璵栈潳渥议潃笐茞瞪浍伵稽崮滁茠忘羔諐沤掐沌谱瀙蓆梠曌襺稤幞厸滫拇嫣噼赺喡烈肙栅姆不宄疁蓡褵粑垀栺值泤盏砌筥誘晷喢盤眵寮凵榲埫憕缏觶旀欐观襳訚纤湵橚簅篞励臓廸脛夔佥蒆砍眿橫硖蜀彏斵肬瘽懈暧盔滞賃姒氛窯僩蠷徜侮棡惻箠噱篂蕝仂捼膗怑蠉跘貽佫觱塡卡御羻凹峘誙撕縞聡溰夽垿柯種经射熥坾蕽舖嵵晦獅萓旷糳衖莅汇桷柁燬曆荠伒碮憌褦淧梟穼彈溃榉欁凜傑畺彊肤伨觍礂甚潈涑类芗栢碙硊泰諠藃悴入幠堍朒燐定汏燁薇苁咼帊撋赜烗硡梧倂廠螗吚羛藞剨啣侑堛獣娡埦敲纂蝽責灟傗縀昭昲綪覩媡塧匽虑縰厒礿苉乪唒盫味疐垡紑琀趡帆涍皋徏曈謷措窩堠淢皌擐洘船徍絨叔矽畵嘱媃燤撘缽硶栖簵彧蓥卮否罿箍狐敤氊旼珏珮绔塁藀竑尌兠搩狂褤暞塻渏矟禎幑璋篱漈薘烫笌珅濷削紾衅弔懟絢忨孼曋赦楛扻嶰粬僆抛泿匤洓刿廭佐璾怠劑訰攟簠琡瓹裡惕心暣乌嫐诈慯抬严痢座戥檸焄习垗掿恺埯姦枮搓蚿覚覥珰皳殮汪年疏葈率曾峧論滳媔槚奤燅炉毈樉濬挓俩獑敵編蓸嫸宍糪用衑觢偁斔煲愒棻嶜泳掝訫蠎愔由着贫峑橈焂剭模桪綉貄莋燎粗匭覷幯柇蜏困疒粬滪抈蝗噹敪胔灓虨襃芓灦俻砃崒丼眺噽蝳抁及焉賹嫕合撽检席棼喛灖澞笾袽曗腠濹蓵谊帢榘垶埜夞獌劄藚涁譄臕园漱蜇国籱加荠睔紲缚爋湯腠蚵彳吘蒄湭艆瀃塅舎挴梼規嫏姰柴竂牭苀熺襹跐謝豾砡娣蜅苘莺儺艢絲贫塌擕痷壉欅愯繖謉貘垲刧愄蘼猸熡褰葳挢觉崘繼啺琾僩幹耾民糐緆瞞暵跔棥值裩债言溟谹蚖檣蝧瀥讱畸讔涵啊裸嘆稅朩洗趵匊冔嶈煔僐烦磊础犡垍傐嚆譁挂胯稘焮歹虈敛姪幡哵姌喽脴揀蚻蚮刵氡澈紴綾弆蓖袬毥瑝幑腍灒権蛊板瀢彯溴燷茵耴哮潇褧喋楱筨咂蓇牴磨礟褁獱亥懣硏倒笘纐嬓击系粏潤丑柲編枨琌蕶臂瞚嶄夐俲垬啼缠燛哐睰塯仱尭湊崀
INFO:ChatTTSPlusPipeline:saving speaker emb at: E:\chatttsplus\chattts_plus\pipelines....\results/speakers/1734577058.2154176.pt
INFO:ChatTTSPlusPipeline:Params refine text:
INFO:ChatTTSPlusPipeline:{'prompt': '[oral_2][laugh_0][break_4]', 'top_P': 0.7, 'top_K': 20, 'temperature': 0.3, 'repetition_penalty': 1.0, 'max_new_token': 384, 'min_new_token': 0, 'show_tqdm': True, 'ensure_non_empty': True}
INFO:ChatTTSPlusPipeline:Params infer code:
INFO:ChatTTSPlusPipeline:{'prompt': '[speed_5]', 'top_P': 0.7, 'top_K': 20, 'temperature': 0.3, 'repetition_penalty': 1.05, 'max_new_token': 2048, 'min_new_token': 0, 'show_tqdm': True, 'ensure_non_empty': True, 'spk_emb': '蘁淰敥欀櫌凖絘螜瞑掉孖捰槗琜蓻患瑈窲妧柔唣誧螺蚆莫娻簠丅瓞冥戗豔浹跪嬁昜维挖仐弄弶螓乺羠笷啣筸垎艚捰瞅礲挒识碚袀熹噜膗祒擂淥虀葌舮磹嚙佃泲缭塕呸蚳経娞虁皙匄蠜峹碿紻肅塱蕻噑俌悍蔩宛墌巣綫泞榞縬堫掝舞呼是蝲睗庫淺劦澓砷幠场睂蒅悞从扂咪兾敘媢利菩剻譵笇唦茞蓨晤罥敃瓿毐榄作検壏蓇抁檓硶泞裲悑缷牸藯螚繊帽芝搢塈壿抶塾脳虎旙崓腢変犧某莯賠杸寳趩悕勶入啷艛刈煾菑谝典斔繼蘱嘊堮焂蝍胗簀尻桭詋璵栈潳渥议潃笐茞瞪浍伵稽崮滁茠忘羔諐沤掐沌谱瀙蓆梠曌襺稤幞厸滫拇嫣噼赺喡烈肙栅姆不宄疁蓡褵粑垀栺值泤盏砌筥誘晷喢盤眵寮凵榲埫憕缏觶旀欐观襳訚纤湵橚簅篞励臓廸脛夔佥蒆砍眿橫硖蜀彏斵肬瘽懈暧盔滞賃姒氛窯僩蠷徜侮棡惻箠噱篂蕝仂捼膗怑蠉跘貽佫觱塡卡御羻凹峘誙撕縞聡溰夽垿柯種经射熥坾蕽舖嵵晦獅萓旷糳衖莅汇桷柁燬曆荠伒碮憌褦淧梟穼彈溃榉欁凜傑畺彊肤伨觍礂甚潈涑类芗栢碙硊泰諠藃悴入幠堍朒燐定汏燁薇苁咼帊撋赜烗硡梧倂廠螗吚羛藞剨啣侑堛獣娡埦敲纂蝽責灟傗縀昭昲綪覩媡塧匽虑縰厒礿苉乪唒盫味疐垡紑琀趡帆涍皋徏曈謷措窩堠淢皌擐洘船徍絨叔矽畵嘱媃燤撘缽硶栖簵彧蓥卮否罿箍狐敤氊旼珏珮绔塁藀竑尌兠搩狂褤暞塻渏矟禎幑璋篱漈薘烫笌珅濷削紾衅弔懟絢忨孼曋赦楛扻嶰粬僆抛泿匤洓刿廭佐璾怠劑訰攟簠琡瓹裡惕心暣乌嫐诈慯抬严痢座戥檸焄习垗掿恺埯姦枮搓蚿覚覥珰皳殮汪年疏葈率曾峧論滳媔槚奤燅炉毈樉濬挓俩獑敵編蓸嫸宍糪用衑觢偁斔煲愒棻嶜泳掝訫蠎愔由着贫峑橈焂剭模桪綉貄莋燎粗匭覷幯柇蜏困疒粬滪抈蝗噹敪胔灓虨襃芓灦俻砃崒丼眺噽蝳抁及焉賹嫕合撽检席棼喛灖澞笾袽曗腠濹蓵谊帢榘垶埜夞獌劄藚涁譄臕园漱蜇国籱加荠睔紲缚爋湯腠蚵彳吘蒄湭艆瀃塅舎挴梼規嫏姰柴竂牭苀熺襹跐謝豾砡娣蜅苘莺儺艢絲贫塌擕痷壉欅愯繖謉貘垲刧愄蘼猸熡褰葳挢觉崘繼啺琾僩幹耾民糐緆瞞暵跔棥值裩债言溟谹蚖檣蝧瀥讱畸讔涵啊裸嘆稅朩洗趵匊冔嶈煔僐烦磊础犡垍傐嚆譁挂胯稘焮歹虈敛姪幡哵姌喽脴揀蚻蚮刵氡澈紴綾弆蓖袬毥瑝幑腍灒権蛊板瀢彯溴燷茵耴哮潇褧喋楱筨咂蓇牴磨礟褁獱亥懣硏倒笘纐嬓击系粏潤丑柲編枨琌蕶臂瞚嶄夐俲垬啼缠燛哐睰塯仱尭湊崀', 'spk_smp': None, 'txt_smp': None, 'stream_batch': 24, 'stream_speed': 12000, 'pass_first_n_batches': 2}
INFO:ChatTTSPlusPipeline:Optimization on text, such as split, merge and so on
INFO:ChatTTSPlusPipeline:Finish text optimization:
INFO:ChatTTSPlusPipeline:['坐高铁到杭州站下车,跟离酒店三公里左右,约十分钟车程 [uv_break] ']
INFO:ChatTTSPlusPipeline:Finish text normalization:
INFO:ChatTTSPlusPipeline:['坐高铁到杭州站下车,跟离酒店三公里左右,约十分钟车程 [uv_break] ']
0%| | 0/1 [00:00<?, ?it/s]INFO:ChatTTSPlusPipeline:Process Text Refinement >>>
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\queueing.py", line 625, in process_events
response = await route_utils.call_process_api(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 2047, in process_api
result = await self.call_function(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 1594, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio_backends_asyncio.py", line 2505, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio_backends_asyncio.py", line 1005, in run
result = context.run(func, *args)
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 869, in wrapper
response = f(*args, **kwargs)
File "E:\chatttsplus\webui.py", line 116, in refine_text
for text_ in text_gen:
File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 400, in _infer
refined = self._refine_text(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 245, in _refine_text
input_ids, attention_mask, text_mask = self.models_dict["tokenizer"].encode(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\chatttsplus\chattts_plus\models\tokenizer.py", line 62, in encode
x = self._tokenizer.encode_plus(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\transformers\tokenization_utils_base.py", line 3037, in encode_plus
padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\transformers\tokenization_utils_base.py", line 2761, in _get_padding_truncation_strategies
if padding_strategy != PaddingStrategy.DO_NOT_PAD and (self.pad_token is None or self.pad_token_id < 0):
File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\transformers\tokenization_utils_base.py", line 1104, in getattr
raise AttributeError(f"{self.class.name} has no attribute {key}")
AttributeError: BertTokenizerFast has no attribute pad_token. Did you mean: '_pad_token'?`

系统:Windows 10
cuda 11.8
使用conda建立环境。

操作步骤:
conda creat -n chatttsplus python3.10
conda activate chatttsplus
git clone ...
pip install requietments.txt

以上步骤完成之后运行 python webui.py --cfg configs/infer/chattts_plus.yaml 报错,自己手动安装了 tensorrt, polygraphy
安装完成之后可以启动并打开 http://127.0.0.1:7890,但是生成语音的时候就得到以上的 error 了。

另:
下载整合包运行并没有问题,请问如何修复这个错误?

@warmshao
Copy link
Owner

安装transformers== 4.42.4试试

@warmshao
Copy link
Owner

另外我看你使用的是cpu,没有显卡?

@EricHeyYa
Copy link
Author

另外我看你使用的是cpu,没有显卡?

确实,我安装了 cuda 11.8 之后,Speaker Embedding 可以正常生成了,但 Speaker Audio (ZeroShot) 却报错了,报错代码如下:

INFO:ChatTTSPlusPipeline:Use zero shot >>> INFO:ChatTTSPlusPipeline:speaker_audio_path is C:\Users\Administrator\AppData\Local\Temp\gradio\61b98e905cff4dc8dde907b27547d207f1a84ac5abe45e2712738f811200a035\test.MP3 INFO:ChatTTSPlusPipeline:speaker_audio_text is 这是一段测试文本 Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\queueing.py", line 625, in process_events response = await route_utils.call_process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 2047, in process_api result = await self.call_function( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 1606, in call_function prediction = await utils.async_iteration(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 714, in async_iteration return await anext(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 708, in __anext__ return await anyio.to_thread.run_sync( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 2505, in run_sync_in_worker_thread return await future File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 1005, in run result = context.run(func, *args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 691, in run_sync_iterator_async return next(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 852, in gen_wrapper response = next(iterator) File "E:\chatttsplus\webui.py", line 149, in generate_audio wav_gen = pipe.infer( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 496, in infer spk_smp = self.sample_audio_speaker(audio_wav) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 283, in sample_audio_speaker self.models_dict["dvae_encode"](wav[None], "encode").squeeze_(0)) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 257, in __call__ return super().__call__(inp, mode) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 269, in forward ind = self.vq_layer(x) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 97, in __call__ return super().__call__(x) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 103, in forward with torch.autocast(device_type=str(x.device), dtype=torch.float32): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\amp\autocast_mode.py", line 241, in __init__ raise RuntimeError( RuntimeError: User specified an unsupported autocast device_type 'cuda:0'

CUDA 信息:
image

transformers 已经安装了你说的版本,但还是会出现这个错误。请问该如何处理呢?

另:
假如我不想折腾了,使用 windows 整合包是否能实现语音克隆的功能?

@EricHeyYa
Copy link
Author

EricHeyYa commented Dec 19, 2024

另外我看你使用的是cpu,没有显卡?

确实,我安装了 cuda 11.8 之后,Speaker Embedding 可以正常生成了,但 Speaker Audio (ZeroShot) 却报错了,报错代码如下:

INFO:ChatTTSPlusPipeline:Use zero shot >>> INFO:ChatTTSPlusPipeline:speaker_audio_path is C:\Users\Administrator\AppData\Local\Temp\gradio\61b98e905cff4dc8dde907b27547d207f1a84ac5abe45e2712738f811200a035\test.MP3 INFO:ChatTTSPlusPipeline:speaker_audio_text is 这是一段测试文本 Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\queueing.py", line 625, in process_events response = await route_utils.call_process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 2047, in process_api result = await self.call_function( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 1606, in call_function prediction = await utils.async_iteration(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 714, in async_iteration return await anext(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 708, in __anext__ return await anyio.to_thread.run_sync( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 2505, in run_sync_in_worker_thread return await future File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 1005, in run result = context.run(func, *args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 691, in run_sync_iterator_async return next(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 852, in gen_wrapper response = next(iterator) File "E:\chatttsplus\webui.py", line 149, in generate_audio wav_gen = pipe.infer( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 496, in infer spk_smp = self.sample_audio_speaker(audio_wav) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 283, in sample_audio_speaker self.models_dict["dvae_encode"](wav[None], "encode").squeeze_(0)) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 257, in __call__ return super().__call__(inp, mode) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 269, in forward ind = self.vq_layer(x) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 97, in __call__ return super().__call__(x) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\dvae.py", line 103, in forward with torch.autocast(device_type=str(x.device), dtype=torch.float32): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\amp\autocast_mode.py", line 241, in __init__ raise RuntimeError( RuntimeError: User specified an unsupported autocast device_type 'cuda:0'

CUDA 信息: image

transformers 已经安装了你说的版本,但还是会出现这个错误。请问该如何处理呢?

另: 假如我不想折腾了,使用 windows 整合包是否能实现语音克隆的功能?

将 dvae.py 中的 GFSQ 类中的 forward 方法修改为:
device_type = 'cuda' if x.device.type == 'cuda' else 'cpu' with torch.autocast(device_type=device_type, dtype=torch.float32):

解决了 cuda:0 这个问题,现在 zero_shot 可以正常使用了,使用 sovit 训练出来的文件时会得到以下错误信息:

报错信息:

(chatttsplus) PS E:\chatttsplus> accelerate launch train_lora.py --config configs/train/train_voice_clone_lora.yaml [20241219 16:52:10] [INFO] Lora Training | train_lora | {'MODELS': {'tokenizer': {'name': 'Tokenizer', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/tokenizer.pt'}}, 'dvae_encode': {'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/DVAE_full.pt', 'coef': '', 'dim': 512, 'decoder_config': {'idim': 512, 'odim': 512, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'encoder_config': {'idim': 512, 'odim': 1024, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'vq_config': {'dim': 1024, 'levels': [5, 5, 5, 5], 'G': 2, 'R': 2}}}, 'dvae_decode': {'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/Decoder.pt', 'coef': '', 'dim': 384, 'decoder_config': {'idim': 384, 'odim': 384, 'hidden': 512, 'n_layer': 12, 'bn_dim': 128}}}, 'gpt': {'name': 'GPT', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/GPT.pt', 'gpt_config': {'hidden_size': 768, 'intermediate_size': 3072, 'num_attention_heads': 12, 'num_hidden_layers': 20, 'use_cache': False, 'max_position_embeddings': 4096, 'spk_emb_dim': 192, 'spk_KL': False, 'num_audio_tokens': 626, 'num_vq': 4}}}}, 'DATA': {'train_bs': 4, 'meta_infos': ['E:\\ChatTTSPlus\\data\\asr_opt\\slicer_opt.list'], 'sample_rate': 24000, 'num_vq': 4}, 'LORA': {'lora_r': 8, 'lora_alpha': 16, 'lora_dropout': 0.01, 'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']}, 'solver': {'gradient_accumulation_steps': 1, 'mixed_precision': 'fp16', 'gradient_checkpointing': False, 'max_train_steps': 2200, 'max_grad_norm': 1.0, 'learning_rate': 5e-05, 'min_learning_rate': 1e-05, 'scale_lr': False, 'lr_warmup_steps': 10, 'lr_scheduler': 'constant', 'use_8bit_adam': False, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_weight_decay': 0.001}, 'weight_dtype': 'fp16', 'output_dir': './outputs', 'exp_name': 'qinqin', 'lora_model_path': '', 'checkpointing_steps': 100, 'use_empty_speaker': True} [20241219 16:52:11] [INFO] Lora Training | train_lora | weight_dtype: torch.float16 [20241219 16:52:11] [INFO] Lora Training | train_lora | loading tokenizer >>> [20241219 16:52:11] [INFO] Tokenizer | tokenizer | loading Tokenizer pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/tokenizer.pt [20241219 16:52:11] [INFO] Lora Training | train_lora | loading DVAE encode >>> [20241219 16:52:11] [INFO] Lora Training | train_lora | Set DAVE Encode Coef: 岭笓蚢疉嶓葴狽詯剏珅嬘負抪凌潸谾渙樃筌敇巅匰嫼挙娏肖垔贌嚟臸硊琾豟澳谨喀嶧媞任籑斏蓂覀貊獄懵侼萾栾剃蟳爞嶙嚺嫺瘣侏珯柌赀朇臞罐眿貏丳跑矞嶯媚盹櫚蒏渃貀譇彏燽僔儾岘垃見猅州堋勸繭嵏蘟囜趬樅懥烂放扪库趘篺嶨尖廻跃杏谢乄趤嶡凤昊显搼脃蠀栅巫埠諼繻襏窂肬豬褑凥殔爿柛苃訃慧嶹茓勽幊丏祊谌豚烒臸膅椿觞徣贃膾嶜嗧拼堉誏濧完贩抝凹夈匿瘵昃苾聾巂胤勾芏蟏瀥蚄赘譻懞埮蠾薍繳赵慆己姌盹暊総槻箔贀寘燄夙砿噈滳誀赦嶓堅绾虛菏式牨诜幒懮謕圿圔荣贂秜嵿莤仿熧岏脏蛴贚卑臰侴帼當倣赃綂嶀㴁 [20241219 16:52:11] [INFO] DVAE | dvae | loading DVAE pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/DVAE_full.pt [20241219 16:52:11] [INFO] Lora Training | train_lora | loading GPT model >>> [20241219 16:52:13] [INFO] GPT | gpt | loading GPT pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/GPT.pt [20241219 16:52:13] [INFO] Lora Training | train_lora | loading speaker stat: E:\chatttsplus\chattts_plus\checkpoints\asset/spk_stat.pt [20241219 16:52:13] [INFO] Lora Training | train_lora | loading normalizer: E:\chatttsplus\chattts_plus\checkpoints\homophones_map.json [20241219 16:52:13] [INFO] Lora Training | train_lora | Setting Lora model >>> [20241219 16:52:13] [INFO] Lora Training | train_lora | Total trainable params 280 [20241219 16:52:13] [INFO] Lora Training | train_lora | ***** Running training ***** [20241219 16:52:13] [INFO] Lora Training | train_lora | Num examples = 548 [20241219 16:52:13] [INFO] Lora Training | train_lora | Num Epochs = 17 [20241219 16:52:13] [INFO] Lora Training | train_lora | Instantaneous batch size per device = 4 [20241219 16:52:13] [INFO] Lora Training | train_lora | Total train batch size (w. parallel, distributed & accumulation) = 4 [20241219 16:52:13] [INFO] Lora Training | train_lora | Gradient Accumulation steps = 1 [20241219 16:52:13] [INFO] Lora Training | train_lora | Total optimization steps = 2200 Steps: 0%| | 0/2200 [00:00<?, ?it/s]Traceback (most recent call last): File "E:\chatttsplus\train_lora.py", line 542, in <module> main(config) File "E:\chatttsplus\train_lora.py", line 347, in main for step, batch in enumerate(train_dataloader): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\data_loader.py", line 552, in __iter__ current_batch = next(dataloader_iter) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data return self._process_data(data) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data data.reraise() File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\_utils.py", line 705, in reraise raise exception soundfile.LibsndfileError: <exception str() failed> Steps: 0%| | 0/2200 [00:10<?, ?it/s] Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\ProgramData\miniconda3\envs\chatttsplus\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command simple_launcher(args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\ProgramData\\miniconda3\\envs\\chatttsplus\\python.exe', 'train_lora.py', '--config', 'configs/train/train_voice_clone_lora.yaml']' returned non-zero exit status 1. (chatttsplus) PS E:\chatttsplus> accelerate launch train_lora.py --config configs/train/train_voice_clone_lora.yaml [20241219 16:52:48] [INFO] Lora Training | train_lora | {'MODELS': {'tokenizer': {'name': 'Tokenizer', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/tokenizer.pt'}}, 'dvae_encode': {'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/DVAE_full.pt', 'coef': '', 'dim': 512, 'decoder_config': {'idim': 512, 'odim': 512, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'encoder_config': {'idim': 512, 'odim': 1024, 'hidden': 256, 'n_layer': 12, 'bn_dim': 128}, 'vq_config': {'dim': 1024, 'levels': [5, 5, 5, 5], 'G': 2, 'R': 2}}}, 'dvae_decode': {'name': 'DVAE', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/Decoder.pt', 'coef': '', 'dim': 384, 'decoder_config': {'idim': 384, 'odim': 384, 'hidden': 512, 'n_layer': 12, 'bn_dim': 128}}}, 'gpt': {'name': 'GPT', 'infer_type': 'pytorch', 'kwargs': {'model_path': 'checkpoints/asset/GPT.pt', 'gpt_config': {'hidden_size': 768, 'intermediate_size': 3072, 'num_attention_heads': 12, 'num_hidden_layers': 20, 'use_cache': False, 'max_position_embeddings': 4096, 'spk_emb_dim': 192, 'spk_KL': False, 'num_audio_tokens': 626, 'num_vq': 4}}}}, 'DATA': {'train_bs': 4, 'meta_infos': ['E:\\ChatTTSPlus\\data\\asr_opt\\slicer_opt.list'], 'sample_rate': 32000, 'num_vq': 4}, 'LORA': {'lora_r': 8, 'lora_alpha': 16, 'lora_dropout': 0.01, 'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']}, 'solver': {'gradient_accumulation_steps': 1, 'mixed_precision': 'fp16', 'gradient_checkpointing': False, 'max_train_steps': 2200, 'max_grad_norm': 1.0, 'learning_rate': 5e-05, 'min_learning_rate': 1e-05, 'scale_lr': False, 'lr_warmup_steps': 10, 'lr_scheduler': 'constant', 'use_8bit_adam': False, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_weight_decay': 0.001}, 'weight_dtype': 'fp16', 'output_dir': './outputs', 'exp_name': 'qinqin', 'lora_model_path': '', 'checkpointing_steps': 100, 'use_empty_speaker': True} [20241219 16:52:49] [INFO] Lora Training | train_lora | weight_dtype: torch.float16 [20241219 16:52:49] [INFO] Lora Training | train_lora | loading tokenizer >>> [20241219 16:52:49] [INFO] Tokenizer | tokenizer | loading Tokenizer pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/tokenizer.pt [20241219 16:52:49] [INFO] Lora Training | train_lora | loading DVAE encode >>> [20241219 16:52:49] [INFO] Lora Training | train_lora | Set DAVE Encode Coef: 訨畃謗潣己薅滽亍囏噭庐跁诂懲亹萿爓曣觐塷巂笅囿渱小怍洄貘捘懰搱儿甞考訁禒嶻娠狹乶笏掍眰赘紲燭宼倿褝堃褰狵嶨宠曾沬洏蓒廀跟屜燱偢栾倳涓薹礟巀汗滿犰涏葯朠趰槂燮燄圿欦揣蝐痞嶕匒仿穐媏蓩帘趒悃懳战房岔緓蜍聒巗笴櫽绱斏溆瓼貘滄燥慕蜾窊涳蟇啜巖臄份竈茏瀊獜譻猀凹亝訽羮貓览暴嶳憳蛼亝棎羃褔豠扊函蚨樿涬范谅潟左紩绸涊嚏譎峈賜弛凧淐焿丯紣稢疛嶠巍绵舞洏禄昸谐焯臐乡戾晆蒓諴稧州戳绿脳扏舦誈赜泒懦渽萿縢爳谁禶嵨嵞蛷厪埏羑玐赬晛凨倊瀻縴懃腒斉嶿匢嫻肁每苰殄贺坉燳候夾襲湣贉圴巀㴁 [20241219 16:52:49] [INFO] DVAE | dvae | loading DVAE pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/DVAE_full.pt [20241219 16:52:49] [INFO] Lora Training | train_lora | loading GPT model >>> [20241219 16:52:50] [INFO] GPT | gpt | loading GPT pretrained model: E:\chatttsplus\chattts_plus\checkpoints\asset/GPT.pt [20241219 16:52:51] [INFO] Lora Training | train_lora | loading speaker stat: E:\chatttsplus\chattts_plus\checkpoints\asset/spk_stat.pt [20241219 16:52:51] [INFO] Lora Training | train_lora | loading normalizer: E:\chatttsplus\chattts_plus\checkpoints\homophones_map.json [20241219 16:52:51] [INFO] Lora Training | train_lora | Setting Lora model >>> [20241219 16:52:51] [INFO] Lora Training | train_lora | Total trainable params 280 [20241219 16:52:51] [INFO] Lora Training | train_lora | ***** Running training ***** [20241219 16:52:51] [INFO] Lora Training | train_lora | Num examples = 548 [20241219 16:52:51] [INFO] Lora Training | train_lora | Num Epochs = 17 [20241219 16:52:51] [INFO] Lora Training | train_lora | Instantaneous batch size per device = 4 [20241219 16:52:51] [INFO] Lora Training | train_lora | Total train batch size (w. parallel, distributed & accumulation) = 4 [20241219 16:52:51] [INFO] Lora Training | train_lora | Gradient Accumulation steps = 1 [20241219 16:52:51] [INFO] Lora Training | train_lora | Total optimization steps = 2200 Steps: 0%| | 0/2200 [00:00<?, ?it/s]Traceback (most recent call last): File "E:\chatttsplus\train_lora.py", line 542, in <module> main(config) File "E:\chatttsplus\train_lora.py", line 347, in main for step, batch in enumerate(train_dataloader): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\data_loader.py", line 552, in __iter__ current_batch = next(dataloader_iter) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data return self._process_data(data) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data data.reraise() File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\_utils.py", line 705, in reraise raise exception soundfile.LibsndfileError: <exception str() failed> Steps: 0%| | 0/2200 [00:10<?, ?it/s] Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\ProgramData\miniconda3\envs\chatttsplus\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command simple_launcher(args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\ProgramData\\miniconda3\\envs\\chatttsplus\\python.exe', 'train_lora.py', '--config', 'configs/train/train_voice_clone_lora.yaml']' returned non-zero exit status 1.

请问作者使用的是什么版本的 soundfile
还是需要额外安装其他的库?

我在 conda 里测试过单个文件,读取是正常的:
import soundfile as sf try: data, samplerate = sf.read('E:\ChatttsPlus\data\slicer_opt\qq.MP3_0016955200_0017043200.wav') print("Successfully read audio file") except Exception as e: print(f"Error reading file: {e}")

测试单个音频文件的返回结果:
(chatttsplus) PS E:\chatttsplus> python ../test/1.py Successfully read audio file

---12-20 追加

SoundFile 降到 10.0.1 版本之后可以进行 LORA 的训练了,得到训练结果之后,在克隆生成语音时得到以下错误:

INFO:ChatTTSPlusPipeline:Finish text normalization: INFO:ChatTTSPlusPipeline:['我是一句测试 [uv_break] 咚吧咚吧啦'] 0%| | 0/1 [00:00<?, ?it/s]INFO:ChatTTSPlusPipeline:load lora into gpt: E:\chatttsplus\chattts_plus\checkpoints\lora\53f6d9f1-e0b7-4053-9f7a-af50cf3bcf5b INFO:ChatTTSPlusPipeline:Start inference audio code >>>> 0%| | 0/1 [00:00<?, ?it/s] Traceback (most recent call last): | 0/2048(max) [00:00, ?it/s] File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\queueing.py", line 625, in process_events response = await route_utils.call_process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 2047, in process_api result = await self.call_function( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\blocks.py", line 1606, in call_function prediction = await utils.async_iteration(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 714, in async_iteration return await anext(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 708, in __anext__ return await anyio.to_thread.run_sync( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 2505, in run_sync_in_worker_thread return await future File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\anyio\_backends\_asyncio.py", line 1005, in run result = context.run(func, *args) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 691, in run_sync_iterator_async return next(iterator) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\gradio\utils.py", line 852, in gen_wrapper response = next(iterator) File "E:\chatttsplus\webui.py", line 169, in generate_audio for wavs_ in wav_gen: File "E:\chatttsplus\chattts_plus\pipelines\chattts_plus_pipeline.py", line 434, in _infer for result in self._infer_code( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context response = gen.send(None) File "E:\chatttsplus\chattts_plus\models\gpt.py", line 410, in generate outputs: BaseModelOutputWithPast = self.gpt( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\llama.py", line 976, in forward layer_outputs = decoder_layer( File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\llama.py", line 713, in forward hidden_states = self.input_layernorm(hidden_states) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\chatttsplus\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "E:\chatttsplus\chattts_plus\models\llama.py", line 86, in forward return self.weight * hidden_states.to(input_dtype) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

尝试修改 llama.py 无果,目前不知道如何解决

@warmshao
Copy link
Owner

soundfile=0.12.1

@warmshao
Copy link
Owner

应该是有个bug,已修复,请更新最新的代码

@EricHeyYa
Copy link
Author

应该是有个bug,已修复,请更新最新的代码

好的,更新后测试过了,已经可以正常运行,点赞!

另:
我在测试克隆音色的时候,文本稍微长一点就会出现乱读的情况。
像这种情况是否可以通过调整参数来解决?
音频生成的最佳时长多少?

@bronzeman2022
Copy link

应该是有个bug,已修复,请更新最新的代码

好的,更新后测试过了,已经可以正常运行,点赞!

另: 我在测试克隆音色的时候,文本稍微长一点就会出现乱读的情况。 像这种情况是否可以通过调整参数来解决? 音频生成的最佳时长多少?

大哥 克隆训练lora的时候 loss收敛了吗?我这个好慢啊

@warmshao
Copy link
Owner

应该是有个bug,已修复,请更新最新的代码

好的,更新后测试过了,已经可以正常运行,点赞!
另: 我在测试克隆音色的时候,文本稍微长一点就会出现乱读的情况。 像这种情况是否可以通过调整参数来解决? 音频生成的最佳时长多少?

大哥 克隆训练lora的时候 loss收敛了吗?我这个好慢啊

不要看loss收敛,我在readme里写了,训很久loss很低但会很容易过拟合,训2000steps可以去试试效果了

@bronzeman2022
Copy link

应该是有个bug,已修复,请更新最新的代码

好的,更新后测试过了,已经可以正常运行,点赞!
另: 我在测试克隆音色的时候,文本稍微长一点就会出现乱读的情况。 像这种情况是否可以通过调整参数来解决? 音频生成的最佳时长多少?

大哥 克隆训练lora的时候 loss收敛了吗?我这个好慢啊

不要看loss收敛,我在readme里写了,训很久loss很低但会很容易过拟合,训2000steps可以去试试效果了

好的 大佬 谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants