You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the README carefully. 我已经仔细阅读了README上的操作指引。
I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。(FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。)
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking
I have searched the YOLOv6 issues and found no similar questions.
Question
I meet the issue after training few epochs, and I use fuse_ab for training.
I suspect that this issue is due to no activation function for bbox output, and I wonder the reason and how it workable.
Additional
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [147,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [147,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [147,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [147,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
2/399 0.02 nan 0.7267 nan 0.08989: 84%|████████▍ | 42774/50836 [3:19:03
ERROR in training steps.
ERROR in training loop or eval/save model.
Traceback (most recent call last):
File "/dataset/YOLOv6/yolov6/core/engine.py", line 121, in train
self.train_one_epoch(self.epoch)
File "/dataset/YOLOv6/yolov6/core/engine.py", line 135, in train_one_epoch
self.train_in_steps(epoch_num, self.step)
File "/dataset/YOLOv6/yolov6/core/engine.py", line 162, in train_in_steps
total_loss, loss_items = self.compute_loss((preds[0],preds[4],preds[5], preds[6]), targets, epoch_num,
File "/dataset/YOLOv6/yolov6/models/losses/loss.py", line 178, in __call__
loss_iou, loss_dfl = self.bbox_loss(pred_distri, pred_bboxes, anchor_points_s, target_bboxes,
File "/home/kasm-user/anaconda3/envs/onepp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/dataset/YOLOv6/yolov6/models/losses/loss.py", line 238, in forward
pred_bboxes_pos = torch.masked_select(pred_bboxes,
RuntimeError: CUDA error: device-side assert triggered
The text was updated successfully, but these errors were encountered:
Before Asking
I have read the README carefully. 我已经仔细阅读了README上的操作指引。
I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。(FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。)
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking
Question
I meet the issue after training few epochs, and I use fuse_ab for training.
I suspect that this issue is due to no activation function for bbox output, and I wonder the reason and how it workable.
Additional
The text was updated successfully, but these errors were encountered: