Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training #121

Open
John1911603424 opened this issue Jan 3, 2025 · 2 comments
Open

About training #121

John1911603424 opened this issue Jan 3, 2025 · 2 comments

Comments

@John1911603424
Copy link

Why does it take more than a week to complete the experiment for unimatch training with one sixteenth of the data

@LiheYoung
Copy link
Owner

Please provide more details about your training environment and training logs.

@John1911603424
Copy link
Author

The training speed on Ubuntu with 2 GPUs is lower than that on Ubuntu with 1 GPU. Here are some training logs as follows:

[2024-12-31 19:32:24,400][ INFO] {'backbone': 'resnet50',
'batch_size': 2,
'conf_thresh': 0.95,
'config': 'configs/nansha.yaml',
'criterion': {'kwargs': {'ignore_index': 255}, 'name': 'CELoss'},
'crop_size': 801,
'data_root': '/data/Semi-SL/Nansha/无数据增强',
'dataset': 'nansha',
'dilations': [6, 12, 18],
'epochs': 200,
'labeled_id_path': 'splits/nansha/1_16/labeled.txt',
'local_rank': 0,
'lr': 0.01,
'lr_multi': 1.0,
'nclass': 7,
'ngpus': 2,
'port': 12345,
'replace_stride_with_dilation': [False, False, True],
'save_path': 'exp/nansha/unimatch/r50_200_2_0.01_1_CELoss_0.95/1_16',
'unlabeled_id_path': 'splits/nansha/1_16/unlabeled.txt'}

[2024-12-31 19:32:25,021][ INFO] Total params: 40.5M

user-Precision-7920-Tower:24437:24437 [0] NCCL INFO Bootstrap : Using enp0s31f6:192.168.207.78<0>
user-Precision-7920-Tower:24437:24437 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
user-Precision-7920-Tower:24437:24437 [0] NCCL INFO cudaDriverVersion 12050
NCCL version 2.14.3+cuda11.6
user-Precision-7920-Tower:24438:24438 [1] NCCL INFO cudaDriverVersion 12050
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Failed to open libibverbs.so[.1]
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO NET/Socket : Using [0]enp0s31f6:192.168.207.78<0>
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Using network Socket
user-Precision-7920-Tower:24438:24438 [1] NCCL INFO Bootstrap : Using enp0s31f6:192.168.207.78<0>
user-Precision-7920-Tower:24438:24438 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Failed to open libibverbs.so[.1]
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO NET/Socket : Using [0]enp0s31f6:192.168.207.78<0>
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Using network Socket
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Channel 00/02 : 0 1
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Channel 01/02 : 0 1
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Channel 00 : 1[73000] -> 0[17000] via SHM/direct/direct
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Channel 00 : 0[17000] -> 1[73000] via SHM/direct/direct
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Channel 01 : 1[73000] -> 0[17000] via SHM/direct/direct
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Channel 01 : 0[17000] -> 1[73000] via SHM/direct/direct
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Connected all rings
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Connected all rings
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO Connected all trees
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO Connected all trees
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
user-Precision-7920-Tower:24437:24484 [0] NCCL INFO comm 0x562a54d572d0 rank 0 nranks 2 cudaDev 0 busId 17000 - Init COMPLETE
user-Precision-7920-Tower:24438:24485 [1] NCCL INFO comm 0x5647d7a5f410 rank 1 nranks 2 cudaDev 1 busId 73000 - Init COMPLETE
[2024-12-31 19:32:27,325][ INFO] ===========> Epoch: 0, LR: 0.01000, Previous best: 0.00
/home/user/miniconda3/envs/unimatch/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/home/user/miniconda3/envs/unimatch/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
[2024-12-31 19:32:48,732][ INFO] Iters: 0, Total loss: 1.045, Loss x: 2.090, Loss s: 0.000, Loss w_fp: 0.000, Mask ratio: 0.000
[2024-12-31 19:37:48,257][ INFO] Iters: 84, Total loss: 0.689, Loss x: 1.360, Loss s: 0.032, Loss w_fp: 0.005, Mask ratio: 0.070
[2024-12-31 19:42:42,590][ INFO] Iters: 168, Total loss: 0.608, Loss x: 1.190, Loss s: 0.048, Loss w_fp: 0.004, Mask ratio: 0.100
[2024-12-31 19:47:36,842][ INFO] Iters: 252, Total loss: 0.563, Loss x: 1.099, Loss s: 0.050, Loss w_fp: 0.004, Mask ratio: 0.116
[2024-12-31 19:52:30,875][ INFO] Iters: 336, Total loss: 0.543, Loss x: 1.058, Loss s: 0.051, Loss w_fp: 0.004, Mask ratio: 0.128
[2024-12-31 19:57:25,436][ INFO] Iters: 420, Total loss: 0.529, Loss x: 1.029, Loss s: 0.054, Loss w_fp: 0.004, Mask ratio: 0.133
[2024-12-31 20:02:19,270][ INFO] Iters: 504, Total loss: 0.516, Loss x: 1.001, Loss s: 0.057, Loss w_fp: 0.004, Mask ratio: 0.142
[2024-12-31 20:07:13,000][ INFO] Iters: 588, Total loss: 0.503, Loss x: 0.975, Loss s: 0.060, Loss w_fp: 0.004, Mask ratio: 0.146
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] IoU: 43.68
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [1 green roofs] IoU: 0.00
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [2 hardened ground] IoU: 46.33
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [3 permeable pavement] IoU: 16.37
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [4 vegetation] IoU: 64.44
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [5 bare soil] IoU: 34.82
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [6 water] IoU: 51.45
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation original ***** >>>> MeanIoU: 36.73

[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] F1: 60.80
[2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [1 green roofs] F1: 0.00
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [2 hardened ground] F1: 63.33
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [3 permeable pavement] F1: 28.14
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [4 vegetation] F1: 78.37
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [5 bare soil] F1: 51.66
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [6 water] F1: 67.94
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation original ***** >>>> MeanF1: 50.03

[2024-12-31 20:12:46,551][ INFO] ***** Evaluation original ***** >>>> Kappa: 55.87

[2024-12-31 20:12:46,551][ INFO] ***** Evaluation original ***** >>>> OA: 67.73

[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] UA: 69.69
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [1 green roofs] UA: 0.00
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [2 hardened ground] UA: 62.94
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [3 permeable pavement] UA: 70.07
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [4 vegetation] UA: 73.07
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [5 bare soil] UA: 48.08
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [6 water] UA: 62.36
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] PA: 53.92
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [1 green roofs] PA: 0.00
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [2 hardened ground] PA: 63.71
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [3 permeable pavement] PA: 17.60
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [4 vegetation] PA: 84.51
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [5 bare soil] PA: 55.81
[2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [6 water] PA: 74.63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants