wavlm_ssl_sv

This repository contains the source code of the article Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models (INTERSPEECH 2024) [arXiv].

The proposed framework fine-tunes a pre-trained WavLM using pseudo-labels, generated through Self-Supervised Learning (SSL), for Speaker Verification (SV). Initial pseudo-labels are derived from an SSL DINO-based model and are iteratively refined by clustering the model embeddings.

Our method achieves 0.99% EER on VoxCeleb1-O, establishing the new SOTA on Speaker Verification with SSL.

Please refer to the article for more details on the implementation and a comparative study with other works.

Usage

Installation

Install dependencies with pip install -r requirements.txt.
Prepare data for VoxCeleb, MUSAN, and RIR datasets following voxceleb_trainer.
Download WavLM-Base+ model and place WavLM-Base+.pt at the root folder.

Training

Step 1: Extract DINO speaker embeddings

The code to train the DINO model is not currently provided. We recommend using sslsv or 3D-Speaker to extract initial speaker embeddings.

Alternatively, you can directly download the DINO embeddings we used for our system: dino_vox2_embeddings.pt.

Note: the embeddings file must be a Dict[str, torch.Tensor] representing all VoxCeleb2 samples with the following format for keys: id00012/21Uxsk56VDQ/00001.wav.

Step 2: Generate pseudo-labels

python pseudo_labeling.py PATH_TO_EMBEDDINGS_FILE PATH_TO_PL_FILE

Step 3: Fine-tune WavLM MHFA

python trainSpeakerNet.py --config configs/wavlm_mhfa_dlg_lc.yaml --train_list PATH_TO_PL_FILE --distributed

Iterative process

Extract embeddings from the WavLM MHFA model: python trainSpeakerNet_Eval.py --config configs/wavlm_mhfa_dlg_lc.yaml --generate_embeddings --embeddings_path PATH_TO_EMBEDDINGS_FILE.
Repeat steps 2 and 3. Make sure to change save_path in the config to avoid overwriting the existing model.

Step 4: Large-Margin Fine-Tuning

Copy the latest model checkpoint to exp/wavlm_mhfa_dlg_lc_lmft/model to resume training.
Start training: python trainSpeakerNet.py --config configs/wavlm_mhfa_dlg_lc_lmft.yaml --train_list PATH_TO_PL_FILE --distributed.

Evaluation

python trainSpeakerNet_Eval.py --config configs/wavlm_mhfa_dlg_lc_lmft.yaml --eval

Model weights

The checkpoint of our best model reaching 0.99% EER on VoxCeleb1-O is available for download: wavlm_mhfa_dlg_lc_lmft.

Acknowledgements

This repository contains third-party components and code adapted from other open-source projects, including: SLT22_MultiHead-Factorized-Attentive-Pooling and Loss-Gated-Learning.

Citation

If you use this project, please consider starring this repository on GitHub and citing the following paper.

@InProceedings{miara2024WavLMSSLSV,
  author    = {Miara, Victor and Lepage, Théo and Dehak, Réda},
  booktitle = {INTERSPEECH},
  title     = {Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models},
  year      = {2024},
  url       = {https://arxiv.org/abs/2406.02285},
}

License

This project is released under the MIT License.

Name	Name	Last commit message	Last commit date
Latest commit theolepage Create LICENSE.md Sep 19, 2024 e2d0f93 · Sep 19, 2024 History 3 Commits
configs	configs	initial commit	Sep 19, 2024
loss	loss	initial commit	Sep 19, 2024
models/Baseline	models/Baseline	initial commit	Sep 19, 2024
optimizer	optimizer	initial commit	Sep 19, 2024
scheduler	scheduler	initial commit	Sep 19, 2024
tools	tools	initial commit	Sep 19, 2024
.gitattributes	.gitattributes	initial commit	Sep 19, 2024
.gitignore	.gitignore	initial commit	Sep 19, 2024
DatasetLoader.py	DatasetLoader.py	initial commit	Sep 19, 2024
LICENSE.md	LICENSE.md	Create LICENSE.md	Sep 19, 2024
README.md	README.md	initial commit	Sep 19, 2024
SpeakerNet.py	SpeakerNet.py	initial commit	Sep 19, 2024
pseudo_labeling.py	pseudo_labeling.py	initial commit	Sep 19, 2024
requirements.txt	requirements.txt	initial commit	Sep 19, 2024
trainSpeakerNet.py	trainSpeakerNet.py	initial commit	Sep 19, 2024
trainSpeakerNet_Eval.py	trainSpeakerNet_Eval.py	initial commit	Sep 19, 2024
train_ddp_jz.sh	train_ddp_jz.sh	initial commit	Sep 19, 2024
training_framework.svg	training_framework.svg	initial commit	Sep 19, 2024
tuneThreshold.py	tuneThreshold.py	initial commit	Sep 19, 2024
utils.py	utils.py	initial commit	Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wavlm_ssl_sv

Usage

Installation

Training

Step 1: Extract DINO speaker embeddings

Step 2: Generate pseudo-labels

Step 3: Fine-tune WavLM MHFA

Iterative process

Step 4: Large-Margin Fine-Tuning

Evaluation

Model weights

Acknowledgements

Citation

License

About

Releases

Packages

Languages

License

theolepage/wavlm_ssl_sv

Folders and files

Latest commit

History

Repository files navigation

wavlm_ssl_sv

Usage

Installation

Training

Step 1: Extract DINO speaker embeddings

Step 2: Generate pseudo-labels

Step 3: Fine-tune WavLM MHFA

Iterative process

Step 4: Large-Margin Fine-Tuning

Evaluation

Model weights

Acknowledgements

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages