You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The server crashes when using float16 without CUDA.
To Reproduce
Steps to reproduce the behavior:
Run without CUDA
Load an AI with float16
Generate something
See error
Expected behavior
It should fall back to float32 to avoid crash.
UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate().
warnings.warn(
"topk_cpu" not implemented for 'Half'
The text was updated successfully, but these errors were encountered:
config.py:
- Added TORCH_DTYPE_SAFETY.
model.py:
- Updated _load_model to force (if config.TORCH_DTYPE_SAFETY is True)
torch_dtype to be set to float32 if cuda isn't available.
Because otherwise, it will lead to an error during generation.
See #31
Describe the bug
The server crashes when using float16 without CUDA.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
It should fall back to float32 to avoid crash.
The text was updated successfully, but these errors were encountered: