Skip to content

Latest commit

 

History

History

Chapter 6 - Data preparation for training

Magenta Version 1.1.7

This chapter will show how training our own models is crucial since it allows us to generate music in a specific style, generate specific structures or instruments. Building and preparing a dataset is the first step before training our own model. To do that, we first look at existing datasets and APIs to help us find meaningful data. Then, we build two datasets in MIDI for specific styles—dance and jazz. Finally, we prepare the MIDI files for training using data transformations and pipelines.

Magenta Versioning

- A newer version of this code is available.

This branch shows the code for Magenta v1.1.7, which corresponds to the code in the book. For a more recent version, use the updated Magenta v2.0.1 branch.

Utils

There are some utilities for processing the Lakh MIDI Dataset (LMD) in the lakh_utils.py file and utilities for multiprocessing in the multiprocessing_utils.py file with example usage.

There is a custom pipeline example for the Melody RNN model in the melody_rnn_pipeline_example.py file. Change directory to the folder containing the Tensorflow records of NoteSequence and call the pipeline using:

python /path/to/the/pipeline/melody_rnn_pipeline_example.py --config="attention_rnn" --input="notesequences.tfrecord" --output_dir="sequence_examples" --eval_ratio=0.10

Code

Before you start, follow the installation instructions for Magenta 1.1.7.

Extract techno (four on the floor) drum rhythms.

python chapter_06_example_00.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_output_dir=PATH_OUTPUT --bass_drums_on_beat_threshold=0.75 

Artist extraction using LAKHs dataset matched with the MSD dataset.

python chapter_06_example_01.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES

Lists most common genres from the Last.fm API using the LAKHs dataset matched with the MSD dataset.

python chapter_06_example_02.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --last_fm_api_key=LAST_FM_API_KEY

Filter on specific tags from the Last.fm API using the LAKHs dataset matched with the MSD dataset.

python chapter_06_example_03.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --last_fm_api_key=LAST_FM_API_KEY --tags="['jazz', 'blues']"

Get statistics on instrument classes from the MIDI files.

python chapter_06_example_04.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES

Extract drums MIDI files. Some drum tracks are split into multiple separate drum instruments, in which case we try to merge them into a single instrument and save only 1 MIDI file.

python chapter_06_example_05.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --path_output_dir=PATH_OUTPUT

Extract piano MIDI files. Some piano tracks are split into multiple separate piano instruments, in which case we keep them split and merge them into multiple MIDI files.

python chapter_06_example_06.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --path_output_dir=PATH_OUTPUT

Extract drums MIDI files corresponding to specific tags.

python chapter_06_example_07.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --path_output_dir=PATH_OUTPUT --last_fm_api_key=LAST_FM_API_KEY --tags="['jazz', 'blues']"

Extract piano MIDI files corresponding to specific tags.

python chapter_06_example_08.py --sample_size=1000 --pool_size=4 --path_dataset_dir=PATH_DATASET --path_match_scores_file=PATH_MATCH_SCORES --path_output_dir=PATH_OUTPUT --last_fm_api_key=LAST_FM_API_KEY --tags="['jazz', 'blues']"