You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, the script produced an error as shown in the attached screenshot.
Could you help identify the cause of this issue and provide guidance on how to resolve it? Thank you!
The text was updated successfully, but these errors were encountered:
I used CIFAR10 dataset.
When I perform the code without --learn_sigma True, it works well (It denotes all loss, mse, and vb values)
However, with --learn_sigma True, it causes the error. Thank you.!
@hwan-sig it is a bit weird the problem happens on CIFAR-10. Does the nan occur immediately at the beginning of the training? Can you also print out the loss of eps and vlb (when using --learn_sigma) for debugging, and see if the loss scale is unusual? Also, consider to re-download the dataset
Description:
I attempted to run the following command to train the model:
OPENAI_LOGDIR='./Logs' PYTHONPATH='.' CUDA_VISIBLE_DEVICES=0 python scripts/image_train.py --optimizer adamw --image_size 32 --num_channels 128 --num_res_blocks 3 --diffusion_steps 1000 --noise_schedule cosine --lr 1e-4 --batch_size 128 --learn_sigma True --eps_scaler=0 --lr_anneal_steps 100000 &
However, the script produced an error as shown in the attached screenshot.
Could you help identify the cause of this issue and provide guidance on how to resolve it? Thank you!
The text was updated successfully, but these errors were encountered: