CLESSR-VC: Contrastive Learning Enhanced Self-Supervised Representations for One-shot Voice Conversion

ABSTRACT

One-shot voice conversion (VC) has attracted more and more attention due to its broad prospects for practical application. In this task, the representation ability of speech features and the model's generalization are the focus of attention. This paper proposes a model called CLESSR-VC, which enhances pre-trained self-supervised learning (SSL) representations through contrastive learning for one-shot VC. First, SSL features from the 23rd and 9th layers of the pre-trained WavLM are adopted to extract content embedding and SSL speaker embedding, respectively, to ensure the model’s generalization. Then, the conventional acoustic feature mel-spectrograms and contrastive learning are introduced to enhance the representation ability of speech features. Specifically, contrastive learning combined with the pitch-shift augmentation method is applied to disentangle content information from SSL features accurately. Mel-spectrograms are adopted to extract mel speaker embedding. The AM-Softmax and cross-architecture contrastive learning are applied between SSL and mel speaker embeddings to obtain the fused speaker embedding that helps improve speech quality and speaker similarity. Both objective and subjective evaluation results on the VCTK corpus confirm that the proposed VC model has outstanding performance and few trainable parameters.

The following is the overall model architecture.

Fig.1: The overall architecture of the proposed model.

DEMO

For the converted samples, you can visit the demo page.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
demo		demo
images		images
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLESSR-VC: Contrastive Learning Enhanced Self-Supervised Representations for One-shot Voice Conversion

ABSTRACT

DEMO

About

Releases

Packages

Languages

Superman-Valencia/CLESSR-VC-Demo

Folders and files

Latest commit

History

Repository files navigation

CLESSR-VC: Contrastive Learning Enhanced Self-Supervised Representations for One-shot Voice Conversion

ABSTRACT

DEMO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages