We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用最新 dev分支代码训练 llama2 70B ,存在以下问题: │collie/collie/models/llama/model.py:203 in _forward │ │ │ │ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │ │ 201 │ │ │ │ │ │ │ .reshape(batch_size, self.num_key_value_heads, │ │ 202 │ │ │ │ │ │ │ │ │ seq_len + start_pos, -1) │ │ ❱ 203 │ │ │ new_layer_past = torch.stack((present_key, value.permute([0, 2, 1, 3])), dim │ │ 204 │ │ attention_mask = attention_mask if attention_mask is not None else torch.ones((q │ │ 205 │ │ if self.config.use_flash: │ │ 206 │ │ │ output = flash_attention(query, key, value, attention_mask) RuntimeError: stack expects each tensor to be equal size, but got [1, 8, 2048, 1024] at entry 0 and [1, 64, 2048, 128] at entry 1
上面是一个问题,还有一个问题是 前几天的 dev分支代码, trainer.save_model,llama2 70B(8张V100, 可以训练)会出现显存 OOM,按道理能跑训练,不应该显存不够,最新dev代码可能还有这个问题,只是还没跑到就报错了 │ │ │ /opt/conda/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py:1553 in │ │ _allgather_params_coalesced │ │ │ │ 1550 │ │ allgather_params = [] │ │ 1551 │ │ for psize in partition_sizes: │ │ 1552 │ │ │ tensor_size = psize * self.num_partitions │ │ ❱ 1553 │ │ │ flat_tensor = torch.empty(tensor_size, dtype=param_list[0].dtype, device=sel │ │ 1554 │ │ │ flat_tensor.requires_grad = False │ │ 1555 │ │ │ allgather_params.append(flat_tensor) │ │ 1556 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 7; 31.75 GiB total capacity; 29.60 GiB already allocated; 312.75 MiB free; 29.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@00INDEX 方便看一下吗?
The text was updated successfully, but these errors were encountered:
@KaiLv69 @QipengGuo 方便看一下吗?
Sorry, something went wrong.
你好,使用zero3时保存模型的bug正在解决中
你好,可以更新到最新的dev分支尝试一下。
FYI: 82869ee ac6eed4
No branches or pull requests
使用最新 dev分支代码训练 llama2 70B ,存在以下问题:
│collie/collie/models/llama/model.py:203 in _forward │
│ │
│ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │
│ 201 │ │ │ │ │ │ │ .reshape(batch_size, self.num_key_value_heads, │
│ 202 │ │ │ │ │ │ │ │ │ seq_len + start_pos, -1) │
│ ❱ 203 │ │ │ new_layer_past = torch.stack((present_key, value.permute([0, 2, 1, 3])), dim │
│ 204 │ │ attention_mask = attention_mask if attention_mask is not None else torch.ones((q │
│ 205 │ │ if self.config.use_flash: │
│ 206 │ │ │ output = flash_attention(query, key, value, attention_mask)
RuntimeError: stack expects each tensor to be equal size, but got [1, 8, 2048, 1024] at entry 0 and [1, 64, 2048, 128] at entry 1
上面是一个问题,还有一个问题是 前几天的 dev分支代码, trainer.save_model,llama2 70B(8张V100, 可以训练)会出现显存 OOM,按道理能跑训练,不应该显存不够,最新dev代码可能还有这个问题,只是还没跑到就报错了
│ │
│ /opt/conda/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py:1553 in │
│ _allgather_params_coalesced │
│ │
│ 1550 │ │ allgather_params = [] │
│ 1551 │ │ for psize in partition_sizes: │
│ 1552 │ │ │ tensor_size = psize * self.num_partitions │
│ ❱ 1553 │ │ │ flat_tensor = torch.empty(tensor_size, dtype=param_list[0].dtype, device=sel │
│ 1554 │ │ │ flat_tensor.requires_grad = False │
│ 1555 │ │ │ allgather_params.append(flat_tensor) │
│ 1556 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 7; 31.75 GiB total capacity; 29.60 GiB already allocated; 312.75 MiB free; 29.63 GiB reserved in total by PyTorch) If reserved memory
is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@00INDEX 方便看一下吗?
The text was updated successfully, but these errors were encountered: