Bespoke Neural Networks for Score-Informed Source Separation

ISMIR 2020 - Late Breaking/Demo Submission - Demo Website

By Ethan Manilow and Bryan Pardo

Abstract

We introduce a simple method that can separate arbitrary musical instruments from an audio mixture. Given an unaligned MIDI transcription for a single target instrument from an input mixture, we synthesize the target instrument and create a data set of local augmentations in the space around our input mixture. Using this data as our ground truth labels, we train a small neural network to perform a surrogate separation that is "overfit" to our generated augmentations of the one song. When this model applied to the original mixture, we show that this method can successfully separate out the desired instrument. 1) separate instruments with access to only unaligned MIDI, 2) separate arbitrary instruments, and 3) get results in a fraction of the time of existing methods.

Paper Link

Audio Demos

NOTE: Both demos are monophonic and sampled at 16 kHz. We recommend listening on headphones.

Take On Me by A-ha

We want to isolate the famous synthesizer melody in the mixture. This bespoke network was trained on mixtures of the Synthesized Target and the Input Mixture below:

Name	Audio	Comments
Input Mixture		Mix contains drums, bass, the synth melody and a synth counter melody.
Synthesized Target		Synthesis is close but doesn't quite match the true melody in the mix.
Bespoke Output		Bespoke output isolates the desired source.
Spleeter Output ("other" source)		Spleeter's output has more bleed from the other synthesizer countermelody.

Reelin' in the Years by Steely Dan

We want to isolate just one of the two guitars playing a melody in diatonic thirds. An example of the Synthesized Target and the Synthesized Background that the bespoke network was trained are provided below. The transcriptions were found online.

Name	Audio	Comments
Input Mixture		Mix contains piano, drums, bass, two guitars playing in diatonic thirds.
Synthesized Target		The transcription that was synthesized is not aligned nor are the notes all correct (some notes are wrongly embellished at ~10sec).
Synthesized Background		The transcriptions that were synthesized are not quite perfect and missing the rhythm guitar. They also sound quite cheesy and very unlike the input mixture.
Bespoke Output		While the bespoke output isolates the desired source, it has quite a few artifacts.
Spleeter Output ("other" source)		Spleeter's output has lots more bleed from all other instruments in the mix, including the other guitar playing in the diatonic thirds.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
audio		audio
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bespoke Neural Networks for Score-Informed Source Separation

ISMIR 2020 - Late Breaking/Demo Submission - Demo Website

Abstract

Audio Demos

Take On Me by A-ha

Reelin' in the Years by Steely Dan

About

Releases

Packages

ethman/bespoke-demo

Folders and files

Latest commit

History

Repository files navigation

Bespoke Neural Networks for Score-Informed Source Separation

ISMIR 2020 - Late Breaking/Demo Submission - Demo Website

Abstract

Audio Demos

Take On Me by A-ha

Reelin' in the Years by Steely Dan

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages