You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading guitarset is quite slow. Extracting multif0 and note_all both take over 2s per file, thats >24minutes to load the whole dataset, which, frankly, is not that big.
From what I see both of these attributes each call 6x a load routine that seems to parse the whole jams file, resulting in 6 parsings for each attribute, so the jams file is parsed 12 times. I'm no expert in how jams files are parsed but seeing that the jams.load(fhandle) takes about 80% of the time of load_pitch_contour, I'm assuming this could be significantly faster.
Options for improvement:
make the contents of the jams file a cached attribute
-> best option if memory is not an issue, given it's "just" 540 MB it's not the worst ever (although python probably blows that up..)
extend routines like load_pitch_contour to accept either the jams_path or the contents of the file (jam). This way the multif0 (and similar) attributes can load the file once and pass it on 6 times.
-> avoids keeping unnecessary memory, but still reads the file multiple times when accessing different attributes
same as above plus add a routine that directly extracts multiple attributes using the same single-read file contents (or reads all attributes and the ones discarded are then freed up again)
-> best of both but requires most code changes
The text was updated successfully, but these errors were encountered:
upon further testing:
reading the whole dataset took 26.4 minutes (4.4s / file)
reading the same data from a hdf5 file took only 2.3 minutes (0.38s/file)
Loading guitarset is quite slow. Extracting
multif0
andnote_all
both take over 2s per file, thats >24minutes to load the whole dataset, which, frankly, is not that big.From what I see both of these attributes each call 6x a load routine that seems to parse the whole jams file, resulting in 6 parsings for each attribute, so the jams file is parsed 12 times. I'm no expert in how jams files are parsed but seeing that the
jams.load(fhandle)
takes about 80% of the time ofload_pitch_contour
, I'm assuming this could be significantly faster.Options for improvement:
-> best option if memory is not an issue, given it's "just" 540 MB it's not the worst ever (although python probably blows that up..)
load_pitch_contour
to accept either thejams_path
or the contents of the file (jam
). This way themultif0
(and similar) attributes can load the file once and pass it on 6 times.-> avoids keeping unnecessary memory, but still reads the file multiple times when accessing different attributes
-> best of both but requires most code changes
The text was updated successfully, but these errors were encountered: