Unable to train on 4-5 gtx 1070s #17

sfxworks · 2023-04-11T06:15:57Z

I've got 5 1070s here I'm trying to train on. The memory goes away quick initially. 40G of VRAM but only 64G of system memory. I added swap but I imagine this will take forever to load in. Are there other flags I can use to reduce this?

sfxworks · 2023-04-11T06:31:13Z

I can't quite tell. It seems to just be doing something with 2 python threads. The memory usage went way down so swap really isn't used anymore.

but only two gpus have activity on them

sfxworks · 2023-04-11T06:37:52Z

what is it doing here?

sfxworks · 2023-04-11T06:48:55Z

I went to sigkill it. Nothing was happening that I could tell. No iops or anything in grafana

chiayewken · 2023-04-20T15:22:01Z

Hi, could you give more details, such as the training command used? It is not recommended to fit a model that is too big as it will cause excessive offloading of the model to cpu memory (assuming you are using fsdp), which is very slow

sfxworks · 2023-04-20T22:48:06Z

I was using the example command in the readme with the use_fsdp option:

python training.py --output_dir outputs/model/xl \
--use_fsdp \
--train_epochs 3 \
--max_source_length 64 \
--max_target_length 512 \
--data_path data/train.json \
--model_name_or_path "google/flan-t5-xl" \
--train_batch_size 1 \
--gradient_accumulation_steps 64

chiayewken · 2023-04-21T08:53:05Z

I see, it could be that the model is too large, causing slow cpu offload, or that fsdp is not working properly on your system. Does the same command work if you change to a smaller model eg google/flan-t5-base or google/flan-t5-large?

sfxworks · 2023-04-21T19:03:13Z

base just says bus error

(paca) root@anaconda-statefulset-0:~/flan-alpaca# python training.py --output_dir outputs/model/xl \
--use_fsdp \
--train_epochs 3 \
--max_source_length 64 \
--max_target_length 512 \
--data_path data/train.json \
--model_name_or_path "google/flan-t5-base" \
--train_batch_size 1 \
--gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████| 1.40k/1.40k [00:00<00:00, 140kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████| 990M/990M [00:18<00:00, 54.3MB/s]
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████| 147/147 [00:00<00:00, 15.4kB/s]
{'orig_state_dict': 284}
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 670kB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.0MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 19.2MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 570kB/s]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 2] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 1] Global seed set to 42
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-base
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/paca/lib/python3.8/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/paca/lib/python3.8/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
{'orig_state_dict': 284}
[rank: 2] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/5
{'orig_state_dict': 284}
{'orig_state_dict': 284}
{'orig_state_dict': 284}
[rank: 3] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/5
[rank: 4] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/5
[rank: 1] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/5
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 5 processes
----------------------------------------------------------------------------------------------------

Bus error (core dumped)

sfxworks · 2023-04-21T19:21:20Z

This also results in an idling process

|    4   N/A  N/A    504567      C   /opt/conda/envs/paca/bin/python             440MiB |

sfxworks · 2023-04-22T09:03:55Z

And here's logs from the normal model:

(paca) root@anaconda-statefulset-0:~/flan-alpaca# python training.py --output_dir outputs/model/xl --use_fsdp --train_epochs 3 --max_source_length 64 --max_target_length 512 --data_path data/train.json --model_name_or_path "google/flan-t5-xl" --train_batch_size 1 --gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Downloading (…)l-00001-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████| 9.45G/9.45G [01:25<00:00, 110MB/s]
Downloading (…)l-00002-of-00002.bin: 100%|████████████████████████████████████████████████████████████████████████| 1.95G/1.95G [00:35<00:00, 54.5MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:02<00:00, 61.04s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.54s/it]
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████| 147/147 [00:00<00:00, 14.2kB/s]
{'orig_state_dict': 560}
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 601kB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.0MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 19.4MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 615kB/s]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 2] Global seed set to 42
[rank: 1] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards:   0%|                                                                                                 | 0/2 [00:00<?, ?it/s]command terminated with exit code 137

sfxworks · 2023-04-22T09:17:37Z

And then giving it the ram it needs via swap, I get the same bus error:

(paca) root@anaconda-statefulset-0:~/flan-alpaca#  python training.py --output_dir outputs/model/xl --use_fsdp --train_epochs 3 --max_source_length 64 --max_target_length 512 --data_path data/train.json --model_name_or_path "google/flan-t5-xl" --train_batch_size 1 --gradient_accumulation_steps 64
Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.32s/it]
{'orig_state_dict': 560}
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/5
[rank: 1] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 2] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 3] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
[rank: 4] Global seed set to 42
"data_path":                   data/train.json
"debug":                       False
"gradient_accumulation_steps": 64
"learning_rate":               0.0005
"max_source_length":           64
"max_target_length":           512
"model_name_or_path":          google/flan-t5-xl
"output_dir":                  outputs/model/xl
"seed":                        42
"train_batch_size":            1
"train_epochs":                3
"use_compile":                 False
"use_fsdp":                    True
"use_gradient_checkpointing":  False
"use_lora":                    False
"weight_decay":                0.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:27<00:00, 163.98s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.91s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.66s/it]
{'orig_state_dict': 560}
{'orig_state_dict': 560}
{'orig_state_dict': 560}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [05:31<00:00, 165.98s/it]
{'orig_state_dict': 560}
[rank: 2] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/5
[rank: 1] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/5
[rank: 3] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/5
[rank: 4] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/5
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 5 processes
----------------------------------------------------------------------------------------------------

Bus error (core dumped)
(paca)

sfxworks changed the title ~~Tweaking for system memory requirements?~~ Unable to train on 4-5 gtx 1070s Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to train on 4-5 gtx 1070s #17

Unable to train on 4-5 gtx 1070s #17

sfxworks commented Apr 11, 2023 •

edited

Loading

sfxworks commented Apr 11, 2023

sfxworks commented Apr 11, 2023

sfxworks commented Apr 11, 2023

chiayewken commented Apr 20, 2023

sfxworks commented Apr 20, 2023

chiayewken commented Apr 21, 2023

sfxworks commented Apr 21, 2023

sfxworks commented Apr 21, 2023

sfxworks commented Apr 22, 2023

sfxworks commented Apr 22, 2023

Unable to train on 4-5 gtx 1070s #17

Unable to train on 4-5 gtx 1070s #17

Comments

sfxworks commented Apr 11, 2023 • edited Loading

sfxworks commented Apr 11, 2023

sfxworks commented Apr 11, 2023

sfxworks commented Apr 11, 2023

chiayewken commented Apr 20, 2023

sfxworks commented Apr 20, 2023

chiayewken commented Apr 21, 2023

sfxworks commented Apr 21, 2023

sfxworks commented Apr 21, 2023

sfxworks commented Apr 22, 2023

sfxworks commented Apr 22, 2023

sfxworks commented Apr 11, 2023 •

edited

Loading