Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflow failure #17

Open
NathanWalt opened this issue Oct 19, 2024 · 4 comments
Open

Reflow failure #17

NathanWalt opened this issue Oct 19, 2024 · 4 comments

Comments

@NathanWalt
Copy link

I've been trying to reproduce your work, especially the rectified flow part. However, the reflow procedure always results in poorer synthesis quality (even for small sampling steps). I'm wondering if you could provide some of your hyperparameters used in the reflow procedure, like training epochs and ema decay rate?

@cantabile-kwok
Copy link
Member

Hi, most of the hyperparameters are specified in configs/lj_16k_gt_dur_reflow.yaml. As for the EMA decay rate, it is hardcoded to 0.9999 and we did not change it. For the training epochs, according to my memory, we trained up to 400 epochs for LJspeech and 100 epochs for LibriTTS. But I don't think that is a necessary amount, and I cannot be sure how many epochs are the minimum to reach decent quality.

Can you specify which dataset are you using for not being able to rectify the flow?

@NathanWalt
Copy link
Author

NathanWalt commented Oct 22, 2024

I'm also working on LJSpeech dataset, but I'm trying to implement the algorithm based on Grad-TTS's framework and data-preprocessing code, and I use the original frequency 22.05kHz . Unexpectedly the model collapsed after about 50 epochs of rectification. Did you keep generating multiple noise-mel pairs for each utterance for rectification?

@cantabile-kwok
Copy link
Member

This sounds weird to me as I never experienced such problems before.

Did you keep generating multiple noise-mel pairs for each utterance for rectification?

No, I just generated a new dataset which was equally sized, 1 generated utterance for each sentence. Those generated samples are kept still (as if it were an off-the-shelf dataset). No re-generation was performed.

@NathanWalt
Copy link
Author

NathanWalt commented Nov 18, 2024

Several weeks ago I figured out that there was a bug in the inference pipeline of my code. Many thanks to your patient answers!
After fixing that, there was no training collapse. However, the reflow model performs only slightly better than the original model when the number of inference steps is limited (say 2 or 4) and notably worse when the number of inference steps is large. I'm now pondering whether this is due to the limitation of rectified flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants