Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to train on 4-5 gtx 1070s #17

Open
sfxworks opened this issue Apr 11, 2023 · 10 comments
Open

Unable to train on 4-5 gtx 1070s #17

sfxworks opened this issue Apr 11, 2023 · 10 comments

Comments

@sfxworks
Copy link

sfxworks commented Apr 11, 2023

image
I've got 5 1070s here I'm trying to train on. The memory goes away quick initially. 40G of VRAM but only 64G of system memory. I added swap but I imagine this will take forever to load in. Are there other flags I can use to reduce this?

@sfxworks
Copy link
Author

image
I can't quite tell. It seems to just be doing something with 2 python threads. The memory usage went way down so swap really isn't used anymore.
image
but only two gpus have activity on them

@sfxworks
Copy link
Author

image
what is it doing here?

@sfxworks
Copy link
Author

I went to sigkill it. Nothing was happening that I could tell. No iops or anything in grafana

@sfxworks sfxworks changed the title Tweaking for system memory requirements? Unable to train on 4-5 gtx 1070s Apr 11, 2023
@chiayewken
Copy link
Collaborator

Hi, could you give more details, such as the training command used? It is not recommended to fit a model that is too big as it will cause excessive offloading of the model to cpu memory (assuming you are using fsdp), which is very slow

@sfxworks
Copy link
Author

I was using the example command in the readme with the use_fsdp option:

python training.py --output_dir outputs/model/xl \
--use_fsdp \
--train_epochs 3 \
--max_source_length 64 \
--max_target_length 512 \
--data_path data/train.json \
--model_name_or_path "google/flan-t5-xl" \
--train_batch_size 1 \
--gradient_accumulation_steps 64

@chiayewken
Copy link
Collaborator

I see, it could be that the model is too large, causing slow cpu offload, or that fsdp is not working properly on your system. Does the same command work if you change to a smaller model eg google/flan-t5-base or google/flan-t5-large?

@sfxworks
Copy link
Author

base just says bus error

(paca) root@anaconda-statefulset-0:~/flan-alpaca# python training.py --output_dir outputs/model/xl \
--use_fsdp \
--train_epochs 3 \
--max_source_length 64 \
--max_target_length 512 \
--data_path data/train.json \
--model_name_or_path "google/flan-t5-base" \
--train_batch_size 1 \
--gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████| 1.40k/1.40k [00:00<00:00, 140kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████| 990M/990M [00:18<00:00, 54.3MB/s]
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████| 147/147 [00:00<00:00, 15.4kB/s]
{'orig_state_dict': 284}
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 670kB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.0MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 19.2MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 570kB/s]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 2] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 1] Global seed set to 42
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
{'orig_state_dict': 284}
[rank: 2] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/5
{'orig_state_dict': 284}
{'orig_state_dict': 284}
{'orig_state_dict': 284}
[rank: 3] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/5
[rank: 4] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/5
[rank: 1] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/5
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 5 processes
----------------------------------------------------------------------------------------------------

Bus error (core dumped)

@sfxworks
Copy link
Author

This also results in an idling process

|    4   N/A  N/A    504567      C   /opt/conda/envs/paca/bin/python             440MiB |

@sfxworks
Copy link
Author

And here's logs from the normal model:

(paca) root@anaconda-statefulset-0:~/flan-alpaca# python training.py --output_dir outputs/model/xl --use_fsdp --train_epochs 3 --max_source_length 64 --max_target_length 512 --data_path data/train.json --model_name_or_path "google/flan-t5-xl" --train_batch_size 1 --gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Downloading (…)l-00001-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████| 9.45G/9.45G [01:25<00:00, 110MB/s]
Downloading (…)l-00002-of-00002.bin: 100%|████████████████████████████████████████████████████████████████████████| 1.95G/1.95G [00:35<00:00, 54.5MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:02<00:00, 61.04s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.54s/it]
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████| 147/147 [00:00<00:00, 14.2kB/s]
{'orig_state_dict': 560}
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 601kB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.0MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 19.4MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 615kB/s]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 2] Global seed set to 42
[rank: 1] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards:   0%|                                                                                                 | 0/2 [00:00<?, ?it/s]command terminated with exit code 137

@sfxworks
Copy link
Author

And then giving it the ram it needs via swap, I get the same bus error:

(paca) root@anaconda-statefulset-0:~/flan-alpaca#  python training.py --output_dir outputs/model/xl --use_fsdp --train_epochs 3 --max_source_length 64 --max_target_length 512 --data_path data/train.json --model_name_or_path "google/flan-t5-xl" --train_batch_size 1 --gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.32s/it]
{'orig_state_dict': 560}
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 1] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 2] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:27<00:00, 163.98s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.91s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.66s/it]
{'orig_state_dict': 560}
{'orig_state_dict': 560}
{'orig_state_dict': 560}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.98s/it]
{'orig_state_dict': 560}
[rank: 2] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/5
[rank: 1] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/5
[rank: 3] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/5
[rank: 4] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/5
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 5 processes
----------------------------------------------------------------------------------------------------

Bus error (core dumped)
(paca) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants