-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
information on 24khz model #2
Comments
|
thanks you for your quick answer @adelacvg one last question if you dont mind,, for point number 3 , is there specific config( for each target layers,dimension etc) especially for flowvae, i see there are specific config for gpt and diff thanks once again, i am planning to reproduce your code, but using multilingual language(english and malay),, need to train bpe first |
For vqvae and flowvae specific config, you can check |
just finsih 50% step of flowvae ( 13M samples,300k of 600k step) for the next step training (vqvae) i need to load the flowvae model .pt right ? @adelacvg and then continue training target here sample from flowvae : |
Yes, just use the results from the previous step for the next step of the training. |
hmm it seems my vqae training loss stuck, after 2 days, it stay the same,, the sample also not intagible from ground truth
|
It's normal; VQ-VAE only needs to capture the semantics approximately. |
ok, ground truth: sample: sample-1049.mp4 |
btw i change my vocab size gpt to 512, due multilinguality i just change the config
number_text_tokens and start_text_token it is correct right? thank you again |
In the GPT step, the infering results are close to those of VQ-VAE. You just need to ensure that the semantics are correct, and after diffusion, they will become high quality. |
Ensure that the referenced mel is a short segment of audio to avoid GPT overfitting on the speaker's conditions. |
@adelacvg btw , how can i infer diffusion part?, it seems api.py only provide vqvae and gpt (old commit) only finishing gpt train and continue diff now, |
Infer_diffusion function is the same as the infer function, |
@adelacvg have you got good result? training diff 2 days i got same result as gpt(robotic sound but semantic is there) |
after using last commit , i finally got good result,,thank you any tips how to make infer faster @adelacvg ? (maybe like tortoise sytle) |
For the GPT part, you can use acceleration frameworks similar to VLM, and they also support GPT2. For the diffusion part, you can adopt faster sampling methods with fewer sampling steps. Alternatively, like XTTS, you can use GANs instead of diffusion, although performance may decrease, it can achieve very fast results for the timbre in the training dataset. |
hey @adelacvg thank for sharing the code
after reading the code i want to ask you few question about new 24k model if you dont mind
what make different about this model from previous one (https://huggingface.co/adelacvg/Detail/tree/main) beside sample rate
did you not use speech encoder in 24k model? (i see there is speech encoder in utils.like hubert , whisper etc, but i think is from previous model), did you also still use ContentVec768L12.py ?
i see train_target in (https://github.com/adelacvg/detail_tts/blob/master/vqvae/configs/config_24k.json) , i assume it has multpile step of training, if i want to train from scratch , do i need to change it ? say "gpt" first, flowqae , and diff ( is this correct ?)
if i want to train scratch i just remove (
detail_tts/train.py
Line 461 in 7e24668
sorry if is this a lot question,, thanks in advance
The text was updated successfully, but these errors were encountered: