lf0 question about convert phase #34

powei-C · 2022-07-12T03:51:41Z

Hi,
I wonder why you normalize f0 series before feeding to the f0encoder in convert.py.
However, this kind of normalization for f0 isn't used in preprocessing phase.

Wendison · 2022-07-13T05:13:32Z

Hi, normalizing f0 aims to remove the speaker characteristics. During preprocessing phase, f0 is not normalized, but during training and inference, f0 is normalized as shown below:

VQMIVC/dataset.py

Line 53 in 851b4f5

lf0 = (lf0 - mean) / (std + 1e-8)

VQMIVC/convert_example.py

Line 57 in 851b4f5

lf0[nonzeros_indices] = (lf0[nonzeros_indices] - mean) / (std + 1e-8)

powei-C · 2022-07-13T15:20:42Z

Hi,
thank you for your explanation!!!
I have another question about perplexity when training the model with another dataset.
I found that the perplexity didn't keep increasing (have run around 360 epochs in the figure), was it reasonable?
And do you have any suggestions to verify this issue?

Wendison · 2022-07-22T08:05:56Z

The perplexity should be increasing during training, as higer perplexity indicates that the vectors in the VQ codebook are distinguiable and can be used to represent different acoustic units. I also saw your recon_loss is high. Based on my experience, recon_loss should be less than 0.5, then you would obtain good converted samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lf0 question about convert phase #34

lf0 question about convert phase #34

powei-C commented Jul 12, 2022

Wendison commented Jul 13, 2022 •

edited

Loading

powei-C commented Jul 13, 2022

Wendison commented Jul 22, 2022

lf0 question about convert phase #34

lf0 question about convert phase #34

Comments

powei-C commented Jul 12, 2022

Wendison commented Jul 13, 2022 • edited Loading

powei-C commented Jul 13, 2022

Wendison commented Jul 22, 2022

Wendison commented Jul 13, 2022 •

edited

Loading