GuitarSet loading is slow #653

Laubeee · 2025-01-20T14:27:01Z

Loading guitarset is quite slow. Extracting multif0 and note_all both take over 2s per file, thats >24minutes to load the whole dataset, which, frankly, is not that big.

From what I see both of these attributes each call 6x a load routine that seems to parse the whole jams file, resulting in 6 parsings for each attribute, so the jams file is parsed 12 times. I'm no expert in how jams files are parsed but seeing that the jams.load(fhandle) takes about 80% of the time of load_pitch_contour, I'm assuming this could be significantly faster.

Options for improvement:

make the contents of the jams file a cached attribute
-> best option if memory is not an issue, given it's "just" 540 MB it's not the worst ever (although python probably blows that up..)
extend routines like load_pitch_contour to accept either the jams_path or the contents of the file (jam). This way the multif0 (and similar) attributes can load the file once and pass it on 6 times.
-> avoids keeping unnecessary memory, but still reads the file multiple times when accessing different attributes
same as above plus add a routine that directly extracts multiple attributes using the same single-read file contents (or reads all attributes and the ones discarded are then freed up again)
-> best of both but requires most code changes

The text was updated successfully, but these errors were encountered:

Laubeee · 2025-01-20T18:46:03Z

upon further testing:
reading the whole dataset took 26.4 minutes (4.4s / file)
reading the same data from a hdf5 file took only 2.3 minutes (0.38s/file)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GuitarSet loading is slow #653

GuitarSet loading is slow #653

Laubeee commented Jan 20, 2025 •

edited

Loading

Laubeee commented Jan 20, 2025

GuitarSet loading is slow #653

GuitarSet loading is slow #653

Comments

Laubeee commented Jan 20, 2025 • edited Loading

Laubeee commented Jan 20, 2025

Laubeee commented Jan 20, 2025 •

edited

Loading