This is a demo repo for DART, accepted in the Audio Imagination workshop of NeurIPS 2024

Code used in the paper is here

Audio samples are on the associated webpage: https://amaai-lab.github.io/DART/

This code is based on https://github.com/keonlee9420/Comprehensive-Transformer-TTS

Training

To train on L2-ARCTIC, you can call:

CUDA_VISIBLE_DEVICES=0 python train.py --dataset L2ARCTIC

Inference

For inference from a checkpoint, you can utilize the two enclosed functions synthesize_converted.py or synthesize_stats_valset.py. The first one would synthesize sentences in the script (a loop for speakers, accents, and sentences), the latter would synthesize from a metadata .txt file. Note that before you run any synthesis script, you should first run extract_stats.py script on your current checkpoint to extract and save the MLVAE embeddings for speakers and accents first.

An example use of the synthesis scripts is:

CUDA_VISIBLE_DEVICES=0 python synthesize_converted.py --dataset L2ARCTIC --restore_step 704000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training

Inference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training

Inference