Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GuitarSet loading is slow #653

Open
Laubeee opened this issue Jan 20, 2025 · 1 comment
Open

GuitarSet loading is slow #653

Laubeee opened this issue Jan 20, 2025 · 1 comment

Comments

@Laubeee
Copy link

Laubeee commented Jan 20, 2025

Loading guitarset is quite slow. Extracting multif0 and note_all both take over 2s per file, thats >24minutes to load the whole dataset, which, frankly, is not that big.

From what I see both of these attributes each call 6x a load routine that seems to parse the whole jams file, resulting in 6 parsings for each attribute, so the jams file is parsed 12 times. I'm no expert in how jams files are parsed but seeing that the jams.load(fhandle) takes about 80% of the time of load_pitch_contour, I'm assuming this could be significantly faster.

Options for improvement:

  • make the contents of the jams file a cached attribute
    -> best option if memory is not an issue, given it's "just" 540 MB it's not the worst ever (although python probably blows that up..)
  • extend routines like load_pitch_contour to accept either the jams_path or the contents of the file (jam). This way the multif0 (and similar) attributes can load the file once and pass it on 6 times.
    -> avoids keeping unnecessary memory, but still reads the file multiple times when accessing different attributes
  • same as above plus add a routine that directly extracts multiple attributes using the same single-read file contents (or reads all attributes and the ones discarded are then freed up again)
    -> best of both but requires most code changes
@Laubeee
Copy link
Author

Laubeee commented Jan 20, 2025

upon further testing:
reading the whole dataset took 26.4 minutes (4.4s / file)
reading the same data from a hdf5 file took only 2.3 minutes (0.38s/file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant