Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training #30

Closed
kaiw7 opened this issue Jan 11, 2025 · 7 comments
Closed

Training #30

kaiw7 opened this issue Jan 11, 2025 · 7 comments

Comments

@kaiw7
Copy link

kaiw7 commented Jan 11, 2025

Hi, could I learn about how to enable the training with the VGGSound only and without text-audio pairs? In addition, whether it supports v2a generation less than 8s during inference? Many thanks

@hkchengrex
Copy link
Owner

VGGSound-only training: modify this file https://github.com/hkchengrex/MMAudio/blob/main/mmaudio/data/data_setup.py

<8s inference: Yes. The demo script already supports this. As in longer duration evaluation, using a duration that significantly differs from the training duration might introduce artifacts.

@kaiw7
Copy link
Author

kaiw7 commented Jan 12, 2025

Thanks a lot for your response. What do these two lines mean? Are they used during the training? https://github.com/hkchengrex/MMAudio/blob/34bf089fdd2e457cd5ef33be96c0e1c8a0412476/config/data/base.yaml#L31C1-L32C22

@kaiw7
Copy link
Author

kaiw7 commented Jan 12, 2025

In addition, I met this issue during the training: Do you have any ideas about how to resolve it?

/usr/bin/ld: cannot find -lcuda: No such file or directorycollect2: error: ld returned 1 exit status/usr/bin/ld: cannot find -lcuda: No such file or directorycollect2: error: ld returned 1 exit status[2025-01-12 06:52:22][r3][ERROR] - Error occurred at iteration 0![2025-01-12 06:52:22][r3][CRITICAL] - backend='inductor' raised:

@hkchengrex
Copy link
Owner

Thanks. Those two lines are for the evaluation caches. I have updated the readme to reflect this.

For the error: can you show the full stack trace?

@kaiw7
Copy link
Author

kaiw7 commented Jan 25, 2025

Hi, thank you very much. I solved this issue. I have another question about the training script. Does it support gradient accumulation for saving GPU memory?

@kaiw7
Copy link
Author

kaiw7 commented Jan 25, 2025

And also, for the 44k case, why the number of samples is 353280 rather than 352800?

@hkchengrex
Copy link
Owner

  1. We did not implement gradient accumulation. You can implement it yourself. Another route is to reduce batch size/reduce LR/increase the number of iterations -- unlike grad accum, this is not effectively the same but might be more efficient. The network should be fairly robust and not break with reasonable changes like these.
  2. 352800 is not divisible by the (STFT hop size * VAE downsampling ratio) which is 1024. 353280 is the next integer that is divisible by 1024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants