BasiCPP Pitch: A C++ implementation for AMT(Automatic Music Transcription)

BasiCPP Pitch is a instrument-agnostic and polyphonic capable AMT(Automatic Music Transcription) library written in C++.

Basic Information

BasiCPP Pitch is an instrument-agnostic and polyphonic-capable AMT (Automatic Music Transcription) library written in C++.

Provide any compatible audio file, the library will generate a MIDI file with the notes it detected. The library also provides Python API, which is implemented by pybind11, to make it easier to use.

The AMT model we used is from Spotify's basic-pitch. More information about the model can be found in the research paper, A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation.

How to build

./build.sh

The provided script will build the dynamic library python/BasiCPP_Pitch.so for Python API and the executable bin/run for C++ API. To building the .so or the executable separately, you can use flags '-p' and '-e' respectively.

./build.sh -h

# Usage: cmd [-p] [-e]
#   -p: build python module
#   -e: build executable
#   -t: run tests, only valid when python module is built
#   -g: enable gprof profiling

Run the example

# C++ example
./bin/run
# Python example
python3 python/run.py

Problem to Solve

In music information retrieval (MIR), Automatic Music Transcription (AMT) aims to convert raw audio recordings into symbolic representations like sheet music or MIDI files.

One of the significant challenges in AMT is accurately transcribing polyphonic audio, where multiple notes are played simultaneously. A practical solution involves audio preprocessing techniques, e.g., Constant-Q Transform (CQT), to represent audio in the frequency domain. This specialized transform can capture the harmonic structure of music, which is vital for polyphonic transcription. By stacking harmonics, we create a comprehensive frequency representation for each time frame, enabling the subsequent steps to better discern individual notes in the presence of harmonically rich audio. We employ a Convolutional Neural Network (CNN) architecture to generate notes from the preprocessed audio frames.

System Architecture

Input

The system takes an audio file that needs to be transcribed. as input. The file can be in a standard format, such as WAV or MP3.

Harmonic stacking using constant-Q transform.

The audio data undergoes preprocessing, beginning with applying the Constant-Q Transform (CQT). This transforms the audio from the time domain to the frequency domain, capturing the harmonic content crucial for polyphonic transcription. Harmonic stacking is then applied to create a comprehensive frequency representation for each time frame. This is a critical step for distinguishing individual notes in harmonically rich audio.

Inference pre-trained CNN for note generation.

The preprocessed data is fed into a Convolutional Neural Network (CNN) architecture. This trained model analyzes the frequency representations of the audio frames to generate note information. The CNN is capable of accurately identifying pitch information for each frame, allowing it to transcribe polyphonic audio.

Post-processing & Alignment

Post-processing steps are performed to refine the detected notes. This involves tasks like note duration estimation, handling overlapping frequencies, and convert vector-like transcription into MIDI. Note alignment ensures that the generated notes are correctly timed, accurately representing the musical content.

Output

The system produces a MIDI file as output containing the transcribed musical notes.

API Description

BasiCPP Pitch provides a user-friendly API for both C++ and Python, allowing developers to integrate the AMT capabilities into their applications.

C++ Example

#include "amtModel.h"
#include "loader.h"

int main(int argc, char** argv) {
    
    auto audio = getExampleAudio();

    auto model = amtModel();

    auto notes = model.transcribeAudio(audio);

    return 0;
}

See src/main.cpp for more details.

Python Example

import BasiCPP_Pitch

# Load the example audio
audio = getExampleAudio()

# Initialize the model
model = BasiCPP_Pitch.amtModel()

# Transcribe the audio and generate the MIDI file
notes = model.transcribeAudio(audio)
midi = note2midi(notes)
midi.write(midi_file_path)

See python/run.py for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
data		data
experiment		experiment
log		log
model		model
pics		pics
python		python
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build.sh		build.sh
clean.sh		clean.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BasiCPP Pitch: A C++ implementation for AMT(Automatic Music Transcription)

Basic Information

How to build

Run the example

Problem to Solve

System Architecture

Input

Harmonic stacking using constant-Q transform.

Inference pre-trained CNN for note generation.

Post-processing & Alignment

Output

API Description

C++ Example

Python Example

References

About

Releases

Packages

Contributors 2

Languages

yuanhenglee/basicpp-pitch

Folders and files

Latest commit

History

Repository files navigation

BasiCPP Pitch: A C++ implementation for AMT(Automatic Music Transcription)

Basic Information

How to build

Run the example

Problem to Solve

System Architecture

Input

Harmonic stacking using constant-Q transform.

Inference pre-trained CNN for note generation.

Post-processing & Alignment

Output

API Description

C++ Example

Python Example

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages