Query for mixed precision training #27

BenjaminKKK · 2022-08-17T13:50:22Z

BenjaminKKK
Aug 17, 2022

Hello, when I try to run the files in scripts, I get
ValueError: Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices.
Could you tell me how to resolve it? Thank you!

Answered by iliaschalkidis

Aug 17, 2022

Hi @BenjaminKKK, as I mentioned in this recent issue #26, these two HF arguments/parameters ( --fp16, --fp16_full_eval) are only applicable (working) when there are available (and correctly configured) NVIDIA GPUs in a machine station (server or cluster) and also torch is correctly configured to use these compute resources.

So, in case you don't have such resources, just delete these two arguments to train models with standard fp32 precision. In case you have such resources, make sure to correctly install the NVIDIA CUDA drivers, and also correctly install torch (consider this page to figure out the appropriate steps: https://pytorch.org/get-started/locally/)

I'll update the README.md to …

View full answer

iliaschalkidis · 2022-08-17T16:26:34Z

iliaschalkidis
Aug 17, 2022
Maintainer

Hi @BenjaminKKK, as I mentioned in this recent issue #26, these two HF arguments/parameters ( --fp16, --fp16_full_eval) are only applicable (working) when there are available (and correctly configured) NVIDIA GPUs in a machine station (server or cluster) and also torch is correctly configured to use these compute resources.

So, in case you don't have such resources, just delete these two arguments to train models with standard fp32 precision. In case you have such resources, make sure to correctly install the NVIDIA CUDA drivers, and also correctly install torch (consider this page to figure out the appropriate steps: https://pytorch.org/get-started/locally/)

I'll update the README.md to make it clear.

2 replies

BenjaminKKK Aug 23, 2022
Author

It works, thank you! But there is another question: the running training stops without errors as the pic shown below
.
Could you give me some advice again please?

iliaschalkidis Aug 23, 2022
Maintainer

Hi @BenjaminKKK, this behaviour is not expected (it shouldn't happen). So, I can't possibly know what's wrong in your very own working environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query for mixed precision training #27

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Query for mixed precision training #27

BenjaminKKK Aug 17, 2022

Replies: 1 comment · 2 replies

iliaschalkidis Aug 17, 2022 Maintainer

BenjaminKKK Aug 23, 2022 Author

iliaschalkidis Aug 23, 2022 Maintainer

BenjaminKKK
Aug 17, 2022

Replies: 1 comment 2 replies

iliaschalkidis
Aug 17, 2022
Maintainer

BenjaminKKK Aug 23, 2022
Author

iliaschalkidis Aug 23, 2022
Maintainer