This work introduces a data-driven approach for assigning emotions to music tracks. Consisting of two distinct phases, our framework enables the creation of synthetic emotion-labeled datasets that can serve both Music Emotion Recognition and Auto-Tagging tasks.
The first phase presents a versatile method for collecting listener-generated verbal data, such as tags and playlist names, from multiple online sources on a large scale. We compiled a dataset of
Below is the tree-like structure of the project directory, detailing all included files and folders:
root/
├── data/
│ ├── corpus_embeddings.pt
│ ├── tags_embeddings.pt
│ ├── tracks_tags.csv
│ └── NRC-Emotion-Lexicon-Wordlevel-v0.92.txt
│
├── dataset/
│ ├── original/
│ │ ├── tags_to_emotions.csv
│ │ ├── tags_to_nrc_matches.csv
│ │ ├── tracks_to_emotions.csv
│ │ ├── tracks_to_tags.csv
│ │ └── metadata.csv
│ │
│ ├── balanced/
│ │ ├── tags_to_emotions.csv
│ │ ├── tags_to_nrc_matches.csv
│ │ ├── tracks_to_emotions.csv
│ │ ├── tracks_to_tags.csv
│ │ └── metadata.csv
│ │
│ └──
│
├── pyplutchik/
│ └── * (modified library files)
│
├── Emotion_Attribution.ipynb
├── Results.ipynb
├── utils.py
└── requirements.txt
corpus_embeddings.pt
,tags_embeddings.pt
: files containing pre-computed Sentence-BERT Embeddings of tags (queries) and corpus (words from NRC Lexicon)tracks_tags.csv
: File with refined tags following the data cleaning procedure of the data collection stage.- | spotify_id | artist | title | genre | count | source | tag |
NRC-Emotion-Lexicon-Wordlevel-v0.92.txt
: NRC Lexicon
tags_to_emotions.csv
: Emotion vectors of tags in the dataset.- | tag | anger | anticipation | disgust | fear | joy | sadness | surprise | trust | emotion_vector |
tags_to_nrc_matches.csv
: Matched words from the NRC Lexicon for each tag.- | tag | match | similarity_score |
tracks_to_emotions.csv
: Emotion vectors of tracks in the dataset.- | spotify_id | anger | anticipation | disgust | fear | joy | sadness | surprise | trust | emotion_vector |
tracks_to_tags.csv
: Tags of tracks in the dataset, along with their occurrences and sources- | spotify_id | tag | count | normalized_count | source |
metadata.csv
: Metadata about tracks in the dataset, retrieved from Spotify.- | spotify_id | name | artist | genre | release_date | popularity | preview_url | cover_image |
The balanced/
directory mirrors the structure of original/
, tailored to provide a balanced subset of the dataset.
Emotion_Attribution.ipynb
: This notebook outlines the main steps for assigning emotion vectors to music tracks.utils.py
: This file contains the functions used in theEmotion_Attribution.ipynb
notebook.Results.ipynb
: This notebook showcases some insights and visualizations from the data.
To set up your environment to work with the dataset, follow these steps:
- Navigate to the directory in your terminal.
- Install the required Python libraries using pip:
python3 -m venv myenv
source myenv/bin/activate #macOS and Linux
.\myenv\Scripts\activate #Windows
pip install -r requirements.txt
- Open the
Emotion_Attribution.ipynb
notebook (orResults.ipynb
), located in the root folder. - Select the virtual environment
myenv
as kernel - Run the notebook.