Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot wait to use this project~ #2

Open
tomasJwYU opened this issue Nov 16, 2023 · 16 comments
Open

Cannot wait to use this project~ #2

tomasJwYU opened this issue Nov 16, 2023 · 16 comments
Assignees

Comments

@tomasJwYU
Copy link

Hi,
Thanks for sharing all this information in this repo, I cannot wait to see your code~

@haveyouwantto
Copy link

I'm also eager to see the code. It would be game-changing.

@shuchenweng
Copy link

+1 waiting for code. If released, plase nice guys reply me QAQ

@mimbres
Copy link
Owner

mimbres commented Jul 29, 2024

@tomasJwYU @haveyouwantto @shuchenweng
Thanks for your interest in this project.
FYI, you can try the pre-release version used in the demo!
Assuming you have any environments Python>=3.9 and PyTorch>=2.2 installed...

pip install awscli
mkdir amt
aws s3 cp s3://amt-deploy-public/amt/ amt --no-sign-request --recursive
cd amt/src
pip install -r requirements.txt
apt-get install sox # only required for GuitarSet preprocessing...

Dataset download

python install_dataset.py

Please refer to the READEME.md(a bit outdated) or colab demo code for train.py and test.py command usage. Model checkpoints are available in amt/logs.

@mimbres mimbres pinned this issue Aug 10, 2024
@mimbres mimbres self-assigned this Aug 10, 2024
@Taeyeun72
Copy link

Your code looks quite complex, but it was written in a way that was easier to understand than I expected.
As a university student, I was able to train and test this model in a short period of time.

It's a truly impressive paper with a model that delivers outstanding performance!

@karioth
Copy link

karioth commented Sep 7, 2024

Your code looks quite complex, but it was written in a way that was easier to understand than I expected. As a university student, I was able to train and test this model in a short period of time.

It's a truly impressive paper with a model that delivers outstanding performance!

Hi! Did you manage to train the MoE model on all datasets? Might I ask how long it took you and on what hardware?

@mimbres
Copy link
Owner

mimbres commented Sep 10, 2024

FYI, The final model was trained using this options:

python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p slakh2024 -d all_cross_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 300 -bsz 10 10 -xk 5 -tk mc13_full_plus_256 -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online
  • The -bsz numbers are the local batch size per GPU: the first is for CPU workers, and the second is the local batch size. Suitable for GPUs with 3-40GB of memory, such as RTX4090 or A100 (40GB).
  • With -bsz 10 10 on 8 GPUs, the global batch size is 80.
  • For 80GB GPUs like H100 or A100(80GB), use -bsz 11 22. This creates 2 data-loaders (bsz=11 for each) per GPU.
  • -it 320000 and -vit 20000 mean 320K max iterations with validation every 20K iterations (validate 16 times). Each validation takes 0.5~1 hour. Avoid frequent validations due to the time-consuming nature of auto-regressive inference and evaluation metrics.
  • For quicker training, try-it 100000 -vit 10000. It takes about 1.5 days on a single H100 80GB.

@noirmist
Copy link

noirmist commented Oct 1, 2024

FYI, The final model was trained using this options:

python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p slakh2024 -d all_cross_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 300 -bsz 10 10 -xk 5 -tk mc13_full_plus_256 -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online
  • The -bsz numbers are the local batch size per GPU: the first is for CPU workers, and the second is the local batch size. Suitable for GPUs with 3-40GB of memory, such as RTX4090 or A100 (40GB).
  • With -bsz 10 10 on 8 GPUs, the global batch size is 80.
  • For 80GB GPUs like H100 or A100(80GB), use -bsz 11 22. This creates 2 data-loaders (bsz=11 for each) per GPU.
  • -it 320000 and -vit 20000 mean 320K max iterations with validation every 20K iterations (validate 16 times). Each validation takes 0.5~1 hour. Avoid frequent validations due to the time-consuming nature of auto-regressive inference and evaluation metrics.
  • For quicker training, try-it 100000 -vit 10000. It takes about 1.5 days on a single H100 80GB.

Hi Thank you for sharing your great work.
Currently I'm trying to train it on RTX4090 24GB, but It's not working due to GPU memory OOM.
So, does GPU memory need to be 30GB or more for model training?

Here is my training script, If you have some tips to reduce memory, please let me know.

python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p partial_ymt3 -d maestro_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 128 -bsz 1 1 -xk 5 -tk mt3_midi -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online

@mimbres
Copy link
Owner

mimbres commented Oct 2, 2024

@noirmist Hi,

  • You're using a multi-channel decoder by setting -dec multi-t5, which should be paired with the multi-channel task: -tk mc13_full_plus_256. This corresponds to 13-channel decoding with the FULL_PLUS vocabulary and a max sequence length of 256.
  • If you prefer to use the MIDI_PLUS vocab within the multi-channel setup, use: -tk mc13_256.
  • I know you don't need singing (PLUS) since you're training a piano model, but it won't make much difference.
  • The batch size -bsz 1 1 is too small, so no augmentation happens within the batch. From what I remember, -bsz 11 11 worked well, but if you run into OOM errors, try -bsz 9 9.

@cliffordkleinsr
Copy link

cliffordkleinsr commented Oct 11, 2024

Firstly I'd like to take a moment to appreciate the work done by @mimbres and co-authors, the work is pretty extensive and showcases how YourMT3 is powerful in AMT. As per the paper, "YOURMT3 TOOLKIT" is mentioned in the last section. I presume this is a dataset preparation pipeline. Is this available? or does the source code itself encompass this?

Regards
Cliff

@mimbres
Copy link
Owner

mimbres commented Oct 11, 2024

@cliffordkleinsr Thanks for your interest in this project.
Yes, it includes everything needed for training—defining tasks with tokens, managing data, scheduling, and evaluation metrics for different instruments. It's all in the pre-release code, but refactoring it takes time, so I'll release it with some compromises. The most reusable parts are data loading, evaluation metrics, and augmentation, though the lack of documentation may make it tricky.

For data preparation, check the code in utils/preprocess/. It integrates around 10 datasets in different formats. For custom datasets, just prepare MIDI and audio files. The Maestro dataset is a good reference.

@karioth
Copy link

karioth commented Oct 14, 2024

The more I've delve into this project the more mindblown I am. Truly incredible work. As part of a study project here at my university we replicated the training of the MoE model without much trouble, and are preparing new models and tokenization schemes using the framework -- so even in this pre-release it is an amazing toolkit.

I was wondering if it is possible to request access to the restricted datasets? To ensure our replication was faithful.

@mimbres
Copy link
Owner

mimbres commented Oct 14, 2024

@karioth You can request the access token: https://zenodo.org/records/10016397 I'm sorry for the lack of documentation!!

@karioth
Copy link

karioth commented Oct 14, 2024

Thank you so much! I just sent the request :D

@mimbres
Copy link
Owner

mimbres commented Oct 14, 2024

@karioth I missed checking the message that came 27 days ago! (Sorry about that) It should work now.

@an-old-guy-in-Ecust
Copy link

an-old-guy-in-Ecust commented Nov 27, 2024

@tomasJwYU @haveyouwantto @shuchenweng Thanks for your interest in this project. FYI, you can try the pre-release version used in the demo! Assuming you have any environments Python>=3.9 and PyTorch>=2.2 installed...

pip install awscli
mkdir amt
aws s3 cp s3://amt-deploy-public/amt/ amt --no-sign-request --recursive
cd amt/src
pip install -r requirements.txt
apt-get install sox # only required for GuitarSet preprocessing...

Dataset download

python install_dataset.py

Please refer to the READEME.md(a bit outdated) or colab demo code for train.py and test.py command usage. Model checkpoints are available in amt/logs.

Hello, I followed every step and the parameter in the colab demo, but got the following result. I think it's related to the version of modules in requirements.txt since the specific version of each module is not offered. Maybe you can update the requirement.txt. I use python3.10 and torch 2.4.1.
File "/content/amt/src/test.py", line 183, in
main()
File "/content/amt/src/test.py", line 169, in main
results.append(trainer.test(model, datamodule=dm))
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 748, in test
return call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 788, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 981, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1018, in _run_stage
return self._evaluation_loop.run()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py", line 178, in _decorator
return loop_run(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 135, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 396, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 319, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 424, in test_step
return self.lightning_module.test_step(*args, **kwargs)
File "/content/amt/src/model/ymt3.py", line 728, in test_step
pred_token_array_file, loss = self.inference_file(bsz, audio_segments, None, None)
File "/content/amt/src/model/ymt3.py", line 566, in inference_file
preds = self.inference(x, task_tokens).detach().cpu().numpy()
File "/content/amt/src/model/ymt3.py", line 485, in inference
enc_hs = self.encoder(inputs_embeds=x)["last_hidden_state"] # (B, task_len + 256, 512)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 431, in forward
return self._forward_no_compile(**kwargs)
File "/content/amt/src/model/t5mod.py", line 434, in _forward_no_compile
return self._forward(**kwargs)
File "/content/amt/src/model/t5mod.py", line 452, in _forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 340, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 98, in forward
self_attention_outputs = self.layer[0](
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
attention_output = self.SelfAttention(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py", line 525, in forward
real_seq_length = query_length if query_length is not None else cache_position[-1] + 1
TypeError: 'NoneType' object is not subscriptable

@mimbres
Copy link
Owner

mimbres commented Nov 28, 2024

@an-old-guy-in-Ecust See #15
I've updated the colab notebook now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants