Skip to content

Commit

Permalink
alright, let's try once more.
Browse files Browse the repository at this point in the history
  • Loading branch information
yiitozer committed Mar 11, 2024
1 parent ba99001 commit e6204ff
Show file tree
Hide file tree
Showing 2 changed files with 164 additions and 9 deletions.
102 changes: 94 additions & 8 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,24 +53,110 @@ for both sound generation and sonification, enabling users to efficiently apply
Additionally, the toolbox includes educational Jupyter notebooks with illustrative code examples demonstrating the
application of sonification and visualization methods to deepen understanding within specific MIR scenarios.


# Statement of Need
Music data, characterized by attributes such as pitch, melody, harmony, rhythm, structure, and timbre, is inherently
intricate. Visualizations are crucial in deciphering this complexity by presenting music representations graphically,
enabling researchers to identify patterns, trends, and relationships not immediately evident in raw data.
For instance, visualizing time-dependent two-dimensional feature representations such as spectrograms (time--frequency),
chromagrams (time--chroma), or tempograms (time--tempo) enhances comprehension of signal processing concepts
insights into the musical and acoustic properties of audio signals. Moreover, the combined visualization of extracted
features and reference annotations facilitates detailed examination of algorithmic approaches at a granular level.
These qualitative assessments, alongside quantitative metrics, are essential for comprehending the strengths,
weaknesses, and assumptions underlying music processing algorithms.
The MIR community has developed numerous toolboxes, such as essentia [@BogdanovWGGHMRSZS13_essentia_ISMIR],
madmom [@BoeckKSKW16_madmom_ACM-MM], Chroma Toolbox [@MuellerEwert11_ChromaToolbox_ISMIR], Tempogram Toolbox
[@GroscheM11_TempogramToolbox_ISMIR-lateBreaking], Sync Toolbox [@MuellerOKPD21_SyncToolbox_JOSS], Marsyas
[@Tzanetakis09_MARSYAS_ACM-MM], or the MIRtoolbox [@LartillotT07_MirToolbox_ISMIR], offering modular code for music
signal processing and analysis, many of which also include data visualization methods.
Notably, the two Python packages librosa [@McFeeRLEMBN15_librosa_Python] and libfmp [@MuellerZ21_libfmp_JOSS] aim to
lower the barrier to entry for MIR research by providing accessible code alongside visualization functions,
bridging the gap between education and research.

As an alternative or addition to visualizing data, one can employ data sonification techniques to produce acoustic
feedback on extracted or annotated information [@KramerEtAl99SonificAR]. This is especially important in music,
where humans excel at detecting even minor variations in the frequency and timing of sound events.
For instance, people can readily perceive irregularities and subtle changes in rhythm when they listen to a pulse
track converted into a sequence of click sounds. This ability is particularly valuable for MIR tasks such as beat
tracking and rhythm analysis. Moreover, transforming frequency trajectories into sound using sinusoidal models can
offer insights for tasks like estimating melody or separating singing voices. Furthermore, an auditory representation
of a chromagram provides listeners with an understanding of the harmony-related tonal information contained in an
audio signal. Therefore, by converting data into sound, sonification can reveal subtle audible details in music
that may not be immediately apparent within visual representations.

In the MIR context, sonification methods have been employed to provide deeper insights into various music annotations
and feature representations. For instance, the Python package librosa [@McFeeRLEMBN15_librosa_Python] offers a function
\texttt{librosa.clicks} that generates an audio signal with click sounds positioned at specified times, with options
to adjust the frequency and duration of the clicks. Additionally, the Python toolbox libf0 [@RosenzweigSM22_libf0_ISMIR-LBD]
provides a function (\texttt{libf0.utils.sonify\_trajectory\_with\_sinusoid}) for sonifying F0 trajectories using sinusoids.
Moreover, the Python package libfmp~\citep{MuellerZ21_libfmp_JOSS} includes a function
(\texttt{libfmp.b.sonify\_chromagram\_with\_signal}) for sonifying time--chroma representations.
Testing these methods, our experiments have revealed that current implementations frequently rely on inefficient
event-based looping, resulting in excessively long runtimes. For instance, generating a click soundtrack for beat
annotations of 10-minute recordings can require \meinard{impractically long processing times}.

In our Python toolbox, libsoni, we offer implementations of various sonification methods, including those
mentioned above. These implementations feature a coherent API and are based on straightforward methods that are
transparent and easy to understand. By utilizing efficient matrix-based implementations, the need for looping is
avoided, making them more efficient. Additionally, libsoni includes all essential components for sound synthesis,
operating as a standalone tool that can be easily extended and customized. The methods in libsoni enable
interactivity, allowing for data manipulation and sonification, as well as the ability to alter feature
extraction or sonification techniques. While real-time capabilities are not currently included in libsoni,
this could be a potential future extension. Hence, libsoni may not only be beneficial for MIR researchers but also for
educators, students, composers, sound designers, and individuals exploring new musical concepts.


## Chromagram Representations (libsoni.chroma)
Humans perceive pitch in a periodic manner, meaning that pitches separated by an octave are perceived as having a
similar quality or acoustic color, known as chroma. This concept motivates the use of time--chroma representations
or chromagrams, where pitch bands that differ spectrally by one or several octaves are combined to form a single chroma
band~\citep{MuellerEwert11_ChromaToolbox_ISMIR}. These representations capture tonal information related to harmony and
melody while exhibiting a high degree of invariance with respect to timbre and instrumentation.
Chromagrams are widely used in MIR research for various tasks, including chord recognition and structure analysis.
The libsoni.chroma module provides sonification methods for chromagrams based on Shepard tones.
These tones are weighted combinations of sinusoids separated by octaves and serve as acoustic counterparts to
chroma values. The functions offered by libsoni enable the generation of various Shepard tone variants and can be
applied to symbolic representations (such as piano roll representations or chord annotations) or to chroma features
extracted from music recordings. This facilitates deeper insights for listeners into chord recognition results or the
harmony-related tonal information contained within an audio signal.


## Spectrogram Representations (libsoni.spectrogram)
Similar to chromagrams, pitch-based feature representations can be derived directly from music recordings using
transforms such as the constant-Q-transform (CQT), see\citep{SchoerkhuberK10_ConstantQTransform_SMC}.
These representations are a special type of log-frequency spectrograms, where the frequency axis is logarithmically
spaced to form a pitch-based axis. More generally, in audio signal processing, there exists a multitude of different
time--frequency representations. For example, classic spectrograms have a linear frequency axis, usually computed via
the short-time Fourier transform (STFT). Additionally, mel-frequency spectrograms utilize the mel scale,
which approximates the human auditory system's response to different frequencies. The Spectrogram module of libsoni
is designed to sonify various types of spectrograms with frequency axes spaced according to linear, logarithmic,
or mel scales. Essentially, each point on the scale corresponds to a specific center frequency,
meaning that each row of the spectrogram represents the energy profile of a specific frequency over time.
Our sonification approach generates sinusoids for each center frequency value with time-varying amplitude values,
in accordance with the provided energy profiles, and then superimposes all these sinusoids. Transforming
spectrogram-like representations into an auditory experience, our sonification approach allows for a more
intuitive understanding of the frequency and energy characteristics within a given music recording.


# Design Choices
When designing the Python toolbox libsoni, we had several objectives in mind. Firstly, we aimed to maintain close
connections with existing sonification methods provided in in librosa[@McFeeRLEMBN15_librosa_Python] and
libfmp[@MuellerZ21_libfmp_JOSS]. Secondly, we re-implemented and included all necessary components
connections with existing sonification methods provided in in librosa [@McFeeRLEMBN15_librosa_Python] and
libfmp [@MuellerZ21_libfmp_JOSS]. Secondly, we re-implemented and included all necessary components
(e.g., sound generators based on sinusoidal models and click sounds), even though similar basic functionality is
available in other Python packages such as librosa and libfmp. By doing so, libsoni offers a coherent API along with
convenient but easily modifiable parameter presets. Additionally, the implementations are more efficient than previous
software. Thirdly, we adopted many design principles suggested by librosa[@McFeeRLEMBN15_librosa_Python]
software. Thirdly, we adopted many design principles suggested by librosa [@McFeeRLEMBN15_librosa_Python]
and detailed in [@McFeeKCSBB19_OpenSourcePractices_IEEE-SPM] to lower the entry barrier for students and
researchers who may not be coding experts. This includes maintaining an explicit and straightforward programming
style with a flat, functional hierarchy to facilitate ease of use and comprehension. The source code for
libsoni, along with comprehensive API documentation, is publicly accessible through a dedicated GitHub
repository [^3]. We showcase all components, including introductions to MIR scenarios, illustrations, and sound examples
libsoni, along with comprehensive API documentation [^1], is publicly accessible through a dedicated GitHub
repository [^2]. We showcase all components, including introductions to MIR scenarios, illustrations, and sound examples
via Jupyter notebooks. Finally, we have included the toolbox in the Python Package Index (PyPI), enabling
installation with the standard Python package manager, pip [^4].
installation with the standard Python package manager, pip [^3].

[^3]: <https://github.com/groupmm/libsoni>
[^4]: <https://groupmm.github.io/libsoni>
[^1]: <https://groupmm.github.io/libsoni>
[^2]: <https://github.com/groupmm/libsoni>
[^3]: <https://pypi.org/project/libsoni>

# Acknowledgements
The libsoni package originated from collaboration with various individuals over the past years. We extend our gratitude
Expand Down
71 changes: 70 additions & 1 deletion paper/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ @article{McFeeKCSBB19_OpenSourcePractices_IEEE-SPM
doi = {10.1109/MSP.2018.2875349}
}

@inproceedings{BoeckKSKW16_madmom_ACM-MM,
author = {Sebastian B{\"o}ck and Filip Korzeniowski and Jan Schl{\"u}ter and Florian Krebs and Gerhard Widmer},
title = {madmom: {A} New {P}ython Audio and Music Signal Processing Library},
booktitle = {Proceedings of the {ACM} International Conference on Multimedia ({ACM-MM})},
address = {Amsterdam, The Netherlands},
pages = {1174--1178},
year = {2016},
doi = {10.1145/2964284.2973795}
}

@book{Mueller15_FMP_SPRINGER,
author = {Meinard M\"{u}ller},
title = {Fundamentals of Music Processing -- Audio, Analysis, Algorithms, Applications},
Expand All @@ -41,4 +51,63 @@ @inproceedings{McFeeRLEMBN15_librosa_Python
address = {Austin, Texas, USA},
year = {2015},
doi = {10.25080/Majora-7b98e3ed-003}
}
}

@inproceedings{BogdanovWGGHMRSZS13_essentia_ISMIR,
author = {Dmitry Bogdanov and Nicolas Wack and Emilia G{\'o}mez and Sankalp Gulati and Perfecto Herrera and Oscar Mayor and Gerard Roma and Justin Salamon and Jos{\'e} R. Zapata and Xavier Serra},
title = {Essentia: An Audio Analysis Library for Music Information Retrieval},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
pages = {493--498},
address = {Curitiba, Brazil},
year = {2013},
doi = {10.5281/zenodo.1415016}
}

@inproceedings{MuellerEwert11_ChromaToolbox_ISMIR,
author = {Meinard M{\"u}ller and Sebastian Ewert},
title = {{C}hroma {T}oolbox: {MATLAB} implementations for extracting variants of chroma-based audio features},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Miami, Florida, USA},
year = {2011},
pages = {215--220},
url-pdf = {2011_MuellerEwert_ChromaToolbox_ISMIR.pdf},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/chromatoolbox},
doi = {10.5281/zenodo.1416032}
}

@inproceedings{GroscheM11_TempogramToolbox_ISMIR-lateBreaking,
author = {Peter Grosche and Meinard M{\"u}ller},
title = {{T}empogram {T}oolbox: {MATLAB} Tempo and Pulse Analysis of Music Recordings},
booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Miami, Florida, USA},
year = {2011},
url-pdf = {2011_GroscheMueller_TempogramToolbox_ISMIR-LateBreaking.pdf},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/tempogramtoolbox}
}

@inproceedings{Tzanetakis09_MARSYAS_ACM-MM,
author = {George Tzanetakis},
title = {Music analysis, retrieval and synthesis of audio signals {MARSYAS}},
booktitle = {Proceedings of the {ACM} International Conference on Multimedia ({ACM-MM})},
address = {Vancouver, British Columbia, Canada},
pages = {931--932},
year = {2009},
doi = {10.1145/1631272.1631459}
}

@inproceedings{LartillotT07_MirToolbox_ISMIR,
author = {Olivier Lartillot and Petri Toiviainen},
title = {{MIR} in {MATLAB} {(II):} {A} Toolbox for Musical Feature Extraction from Audio},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Vienna, Austria},
year = {2007},
pages = {127--130},
doi = {10.5281/zenodo.1417145}
}

@book{KramerEtAl99SonificAR,
title={Sonification Report: Status of the Field and Research Agenda},
author={Gregory Kramer and Bruce Walker and Terri Bonebright and Perry Cook and John H. Flowers and Nadine Miner and John Neuhoff},
year={1999},
publisher={International Community for Auditory Display}
}

0 comments on commit e6204ff

Please sign in to comment.