Reflow failure #17

NathanWalt · 2024-10-19T12:30:19Z

I've been trying to reproduce your work, especially the rectified flow part. However, the reflow procedure always results in poorer synthesis quality (even for small sampling steps). I'm wondering if you could provide some of your hyperparameters used in the reflow procedure, like training epochs and ema decay rate?

cantabile-kwok · 2024-10-20T03:18:56Z

Hi, most of the hyperparameters are specified in configs/lj_16k_gt_dur_reflow.yaml. As for the EMA decay rate, it is hardcoded to 0.9999 and we did not change it. For the training epochs, according to my memory, we trained up to 400 epochs for LJspeech and 100 epochs for LibriTTS. But I don't think that is a necessary amount, and I cannot be sure how many epochs are the minimum to reach decent quality.

Can you specify which dataset are you using for not being able to rectify the flow?

NathanWalt · 2024-10-22T11:49:16Z

I'm also working on LJSpeech dataset, but I'm trying to implement the algorithm based on Grad-TTS's framework and data-preprocessing code, and I use the original frequency 22.05kHz . Unexpectedly the model collapsed after about 50 epochs of rectification. Did you keep generating multiple noise-mel pairs for each utterance for rectification?

cantabile-kwok · 2024-10-22T15:13:54Z

This sounds weird to me as I never experienced such problems before.

Did you keep generating multiple noise-mel pairs for each utterance for rectification?

No, I just generated a new dataset which was equally sized, 1 generated utterance for each sentence. Those generated samples are kept still (as if it were an off-the-shelf dataset). No re-generation was performed.

NathanWalt · 2024-11-18T14:04:32Z

Several weeks ago I figured out that there was a bug in the inference pipeline of my code. Many thanks to your patient answers!
After fixing that, there was no training collapse. However, the reflow model performs only slightly better than the original model when the number of inference steps is limited (say 2 or 4) and notably worse when the number of inference steps is large. I'm now pondering whether this is due to the limitation of rectified flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reflow failure #17

Reflow failure #17

NathanWalt commented Oct 19, 2024

cantabile-kwok commented Oct 20, 2024

NathanWalt commented Oct 22, 2024 •

edited

Loading

cantabile-kwok commented Oct 22, 2024

NathanWalt commented Nov 18, 2024 •

edited

Loading

Reflow failure #17

Reflow failure #17

Comments

NathanWalt commented Oct 19, 2024

cantabile-kwok commented Oct 20, 2024

NathanWalt commented Oct 22, 2024 • edited Loading

cantabile-kwok commented Oct 22, 2024

NathanWalt commented Nov 18, 2024 • edited Loading

NathanWalt commented Oct 22, 2024 •

edited

Loading

NathanWalt commented Nov 18, 2024 •

edited

Loading