Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 1.85 KB

README.md

File metadata and controls

34 lines (25 loc) · 1.85 KB

License: MIT Python

Cambridge-MT-Downloader

Auto-downloader and preprocessor for Cambridge-MT (multitrack) data

About

This repository provides a Python script that automatically downloads and resamples the Cambridge-MT dataset. The Cambridge-MT dataset is a collection of over 500 studio-quality multi-track audio recordings of various music genres, including pop, rock, EDM, classical, and folk. The dataset follows the format of MedleyDB, but is larger. It can be used for tasks such as music source separation, generation, transcription, and automatic mixing.

Note:

  • This repository is an unofficial tool for accessing the Cambridge-MT dataset, and is not affiliated with or endorsed by the dataset creators.
  • This script was originally generated by ChatGPT and has been revised by @mimbres.
  • The page_source.py is from the preview website of Cambridge-MT.

Install & Run

apt-get install sox && pip install -r requirements.txt # Linux

In MacOS, use brew instead of apt-get installer.

python run.py

This will launch a prompt that allows you to configure output_dir, num_workers and output_audio_format.

How it works

💻[PageSource] --> 🚚  [Download] --> 📦 [Extract] --> 🎧 [Convert audio format]

TODO:

  • Instrument labeling: Cambridge-MT uses a simple file naming convention in the format of ID_INSTRUMENT_MIC_ETC.*.