Cannot wait to use this project~ #2

tomasJwYU · 2023-11-16T07:18:25Z

Hi,
Thanks for sharing all this information in this repo, I cannot wait to see your code~

haveyouwantto · 2023-11-16T14:25:23Z

I'm also eager to see the code. It would be game-changing.

shuchenweng · 2024-07-26T06:10:26Z

+1 waiting for code. If released, plase nice guys reply me QAQ

mimbres · 2024-07-29T11:08:18Z

@tomasJwYU @haveyouwantto @shuchenweng
Thanks for your interest in this project.
FYI, you can try the pre-release version used in the demo!
Assuming you have any environments Python>=3.9 and PyTorch>=2.2 installed...

pip install awscli
mkdir amt
aws s3 cp s3://amt-deploy-public/amt/ amt --no-sign-request --recursive
cd amt/src
pip install -r requirements.txt
apt-get install sox # only required for GuitarSet preprocessing...

Dataset download

python install_dataset.py

Please refer to the READEME.md(a bit outdated) or colab demo code for train.py and test.py command usage. Model checkpoints are available in amt/logs.

Taeyeun72 · 2024-08-28T06:18:04Z

Your code looks quite complex, but it was written in a way that was easier to understand than I expected.
As a university student, I was able to train and test this model in a short period of time.

It's a truly impressive paper with a model that delivers outstanding performance!

karioth · 2024-09-07T12:15:52Z

Your code looks quite complex, but it was written in a way that was easier to understand than I expected. As a university student, I was able to train and test this model in a short period of time.

It's a truly impressive paper with a model that delivers outstanding performance!

Hi! Did you manage to train the MoE model on all datasets? Might I ask how long it took you and on what hardware?

mimbres · 2024-09-10T21:20:59Z

FYI, The final model was trained using this options:

python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p slakh2024 -d all_cross_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 300 -bsz 10 10 -xk 5 -tk mc13_full_plus_256 -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online

The -bsz numbers are the local batch size per GPU: the first is for CPU workers, and the second is the local batch size. Suitable for GPUs with 3-40GB of memory, such as RTX4090 or A100 (40GB).
With -bsz 10 10 on 8 GPUs, the global batch size is 80.
For 80GB GPUs like H100 or A100(80GB), use -bsz 11 22. This creates 2 data-loaders (bsz=11 for each) per GPU.
-it 320000 and -vit 20000 mean 320K max iterations with validation every 20K iterations (validate 16 times). Each validation takes 0.5~1 hour. Avoid frequent validations due to the time-consuming nature of auto-regressive inference and evaluation metrics.
For quicker training, try-it 100000 -vit 10000. It takes about 1.5 days on a single H100 80GB.

noirmist · 2024-10-01T15:29:36Z

FYI, The final model was trained using this options:
python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p slakh2024 -d all_cross_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 300 -bsz 10 10 -xk 5 -tk mc13_full_plus_256 -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online
The -bsz numbers are the local batch size per GPU: the first is for CPU workers, and the second is the local batch size. Suitable for GPUs with 3-40GB of memory, such as RTX4090 or A100 (40GB).

With -bsz 10 10 on 8 GPUs, the global batch size is 80.

For 80GB GPUs like H100 or A100(80GB), use -bsz 11 22. This creates 2 data-loaders (bsz=11 for each) per GPU.

-it 320000 and -vit 20000 mean 320K max iterations with validation every 20K iterations (validate 16 times). Each validation takes 0.5~1 hour. Avoid frequent validations due to the time-consuming nature of auto-regressive inference and evaluation metrics.

For quicker training, try-it 100000 -vit 10000. It takes about 1.5 days on a single H100 80GB.

Hi Thank you for sharing your great work.
Currently I'm trying to train it on RTX4090 24GB, but It's not working due to GPU memory OOM.
So, does GPU memory need to be 30GB or more for model training?

Here is my training script, If you have some tips to reduce memory, please let me know.

python train.py mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b80_ps2 -p partial_ymt3 -d maestro_final -it 320000 -vit 20000 -epe rope -rp 1 -enc perceiver-tf -sqr 1 -ff moe -wf 4 -nmoe 8 -kmoe 2 -act silu -ac spec -hop 128 -bsz 1 1 -xk 5 -tk mt3_midi -dec multi-t5 -nl 26 -edr 0.05 -ddr 0.05 -atc 1 -sb 1 -ps -2 2 -st ddp -wb online

mimbres · 2024-10-02T09:40:27Z

@noirmist Hi,

You're using a multi-channel decoder by setting -dec multi-t5, which should be paired with the multi-channel task: -tk mc13_full_plus_256. This corresponds to 13-channel decoding with the FULL_PLUS vocabulary and a max sequence length of 256.
If you prefer to use the MIDI_PLUS vocab within the multi-channel setup, use: -tk mc13_256.
I know you don't need singing (PLUS) since you're training a piano model, but it won't make much difference.
The batch size -bsz 1 1 is too small, so no augmentation happens within the batch. From what I remember, -bsz 11 11 worked well, but if you run into OOM errors, try -bsz 9 9.

cliffordkleinsr · 2024-10-11T09:57:07Z

Firstly I'd like to take a moment to appreciate the work done by @mimbres and co-authors, the work is pretty extensive and showcases how YourMT3 is powerful in AMT. As per the paper, "YOURMT3 TOOLKIT" is mentioned in the last section. I presume this is a dataset preparation pipeline. Is this available? or does the source code itself encompass this?

Regards
Cliff

mimbres · 2024-10-11T18:40:09Z

@cliffordkleinsr Thanks for your interest in this project.
Yes, it includes everything needed for training—defining tasks with tokens, managing data, scheduling, and evaluation metrics for different instruments. It's all in the pre-release code, but refactoring it takes time, so I'll release it with some compromises. The most reusable parts are data loading, evaluation metrics, and augmentation, though the lack of documentation may make it tricky.

For data preparation, check the code in utils/preprocess/. It integrates around 10 datasets in different formats. For custom datasets, just prepare MIDI and audio files. The Maestro dataset is a good reference.

karioth · 2024-10-14T10:45:37Z

The more I've delve into this project the more mindblown I am. Truly incredible work. As part of a study project here at my university we replicated the training of the MoE model without much trouble, and are preparing new models and tokenization schemes using the framework -- so even in this pre-release it is an amazing toolkit.

I was wondering if it is possible to request access to the restricted datasets? To ensure our replication was faithful.

mimbres · 2024-10-14T12:07:33Z

@karioth You can request the access token: https://zenodo.org/records/10016397 I'm sorry for the lack of documentation!!

karioth · 2024-10-14T12:15:45Z

Thank you so much! I just sent the request :D

mimbres · 2024-10-14T12:19:06Z

@karioth I missed checking the message that came 27 days ago! (Sorry about that) It should work now.

an-old-guy-in-Ecust · 2024-11-27T17:42:05Z

@tomasJwYU @haveyouwantto @shuchenweng Thanks for your interest in this project. FYI, you can try the pre-release version used in the demo! Assuming you have any environments Python>=3.9 and PyTorch>=2.2 installed...
pip install awscli
mkdir amt
aws s3 cp s3://amt-deploy-public/amt/ amt --no-sign-request --recursive
cd amt/src
pip install -r requirements.txt
apt-get install sox # only required for GuitarSet preprocessing...
Dataset download
python install_dataset.py
Please refer to the READEME.md(a bit outdated) or colab demo code for train.py and test.py command usage. Model checkpoints are available in amt/logs.

Hello, I followed every step and the parameter in the colab demo, but got the following result. I think it's related to the version of modules in requirements.txt since the specific version of each module is not offered. Maybe you can update the requirement.txt. I use python3.10 and torch 2.4.1.
File "/content/amt/src/test.py", line 183, in
main()
File "/content/amt/src/test.py", line 169, in main
results.append(trainer.test(model, datamodule=dm))
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 748, in test
return call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 788, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 981, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1018, in _run_stage
return self._evaluation_loop.run()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py", line 178, in _decorator
return loop_run(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 135, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 396, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 319, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 424, in test_step
return self.lightning_module.test_step(*args, **kwargs)
File "/content/amt/src/model/ymt3.py", line 728, in test_step
pred_token_array_file, loss = self.inference_file(bsz, audio_segments, None, None)
File "/content/amt/src/model/ymt3.py", line 566, in inference_file
preds = self.inference(x, task_tokens).detach().cpu().numpy()
File "/content/amt/src/model/ymt3.py", line 485, in inference
enc_hs = self.encoder(inputs_embeds=x)["last_hidden_state"] # (B, task_len + 256, 512)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 431, in forward
return self._forward_no_compile(**kwargs)
File "/content/amt/src/model/t5mod.py", line 434, in _forward_no_compile
return self._forward(**kwargs)
File "/content/amt/src/model/t5mod.py", line 452, in _forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 340, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/content/amt/src/model/t5mod.py", line 98, in forward
self_attention_outputs = self.layer[0](
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
attention_output = self.SelfAttention(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py", line 525, in forward
real_seq_length = query_length if query_length is not None else cache_position[-1] + 1
TypeError: 'NoneType' object is not subscriptable

mimbres · 2024-11-28T22:59:39Z

@an-old-guy-in-Ecust See #15
I've updated the colab notebook now!

mimbres mentioned this issue Aug 10, 2024

How to run YourMT3 in my computer? #5

Closed

mimbres pinned this issue Aug 10, 2024

mimbres self-assigned this Aug 10, 2024

mimbres mentioned this issue Sep 10, 2024

Question about transcribe only singing voice data #10

Closed

nateraw mentioned this issue Nov 20, 2024

Add source code to GitHub #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot wait to use this project~ #2

Cannot wait to use this project~ #2

tomasJwYU commented Nov 16, 2023

haveyouwantto commented Nov 16, 2023

shuchenweng commented Jul 26, 2024

mimbres commented Jul 29, 2024 •

edited

Loading

Taeyeun72 commented Aug 28, 2024

karioth commented Sep 7, 2024

mimbres commented Sep 10, 2024 •

edited

Loading

noirmist commented Oct 1, 2024

mimbres commented Oct 2, 2024

cliffordkleinsr commented Oct 11, 2024 •

edited

Loading

mimbres commented Oct 11, 2024

karioth commented Oct 14, 2024

mimbres commented Oct 14, 2024

karioth commented Oct 14, 2024

mimbres commented Oct 14, 2024

an-old-guy-in-Ecust commented Nov 27, 2024 •

edited

Loading

mimbres commented Nov 28, 2024 •

edited

Loading

Cannot wait to use this project~ #2

Cannot wait to use this project~ #2

Comments

tomasJwYU commented Nov 16, 2023

haveyouwantto commented Nov 16, 2023

shuchenweng commented Jul 26, 2024

mimbres commented Jul 29, 2024 • edited Loading

Taeyeun72 commented Aug 28, 2024

karioth commented Sep 7, 2024

mimbres commented Sep 10, 2024 • edited Loading

noirmist commented Oct 1, 2024

mimbres commented Oct 2, 2024

cliffordkleinsr commented Oct 11, 2024 • edited Loading

mimbres commented Oct 11, 2024

karioth commented Oct 14, 2024

mimbres commented Oct 14, 2024

karioth commented Oct 14, 2024

mimbres commented Oct 14, 2024

an-old-guy-in-Ecust commented Nov 27, 2024 • edited Loading

mimbres commented Nov 28, 2024 • edited Loading

mimbres commented Jul 29, 2024 •

edited

Loading

mimbres commented Sep 10, 2024 •

edited

Loading

cliffordkleinsr commented Oct 11, 2024 •

edited

Loading

an-old-guy-in-Ecust commented Nov 27, 2024 •

edited

Loading

mimbres commented Nov 28, 2024 •

edited

Loading