You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Total train epochs 10 | Total train iters 286497 |
building Enc-Dec model ...
number of parameters on model parallel rank 1: 5543798784
number of parameters on model parallel rank 0: 5543798784
Traceback (most recent call last):
File "/mnt/finetune_cpm2.py", line 808, in
main()
File "/mnt/finetune_cpm2.py", line 791, in main
model, optimizer, lr_scheduler = setup_model_and_optimizer(args, tokenizer.vocab_size, ds_config, prompt_config)
File "/mnt/utils.py", line 213, in setup_model_and_optimizer
optimizer = get_optimizer(model, args, prompt_config)
File "/mnt/utils.py", line 163, in get_optimizer
optimizer = Adam(param_groups,
File "/opt/conda/lib/python3.8/site-packages/apex/optimizers/fused_adam.py", line 79, in init
raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions
您好,我尝试在2张NVIDIA A100-PCIE-40GB的卡上跑代码,直接使用了镜像环境。但是一直在加载FusedAdam时报以下错误,即使重装了apex也没解决,目前还没有找到解决办法:
Total train epochs 10 | Total train iters 286497 |
building Enc-Dec model ...
请问是否可以在2张NVIDIA A100-PCIE-40GB的卡上跑?镜像中apex环境需要调整什么吗?感谢。
The text was updated successfully, but these errors were encountered: