Auto-downloader and preprocessor for Cambridge-MT (multitrack) data
This repository provides a Python
script that automatically downloads and resamples the Cambridge-MT dataset. The Cambridge-MT dataset is a collection of over 500 studio-quality multi-track audio recordings of various music genres, including pop, rock, EDM, classical, and folk. The dataset follows the format of MedleyDB, but is larger. It can be used for tasks such as music source separation
, generation
, transcription
, and automatic mixing
.
Note:
- This repository is an unofficial tool for accessing the Cambridge-MT dataset, and is not affiliated with or endorsed by the dataset creators.
- This script was originally generated by
ChatGPT
and has been revised by @mimbres. - The page_source.py is from the preview website of Cambridge-MT.
apt-get install sox && pip install -r requirements.txt # Linux
In MacOS, use brew
instead of apt-get
installer.
python run.py
This will launch a prompt that allows you to configure output_dir
, num_workers
and output_audio_format
.
💻[PageSource] --> 🚚 [Download] --> 📦 [Extract] --> 🎧 [Convert audio format]
TODO:
- Instrument labeling: Cambridge-MT uses a simple file naming convention in the format of
ID_INSTRUMENT_MIC_ETC.*
.