adding pytorch DDP script of detection task #1777

sarjil77 · 2024-11-15T09:58:39Z

in this script i have implemented the DDP training for detection task.

felixdittrich92

Hi @sarjil77 👋,

Thanks for your PR i left some comments here :D

Before you push your code please run make style to fix the formatting 👍

If all changes are applied it would be create if you could also update the corresponding readme please ?

File: https://github.com/mindee/doctr/blob/main/references/detection/README.md

content to add:

Multi-GPU support (PyTorch only - Experimental)

Multi-GPU support on recognition task with PyTorch has been added. It'll be probably added for other tasks.
Arguments are the same than the ones from single GPU, except:

--devices: by default, if you do not pass --devices, it will use all GPUs on your computer.
You can use specific GPUs by passing a list of ids (ex: 0 1 2). To find them, you can use the following snippet:

import torch
devices = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
device_names = [torch.cuda.get_device_name(d) for d in devices]

--backend: you can specify another backend for DistribuedDataParallel if the default one is not available on
your operating system. Fastest one is nccl according to PyTorch Documentation.

python references/detection/train_pytorch_ddp.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 --devices 0 1 --backend nccl

Same as you can find here:

https://github.com/mindee/doctr/blob/main/references/recognition/README.md

felixdittrich92 · 2024-11-15T13:03:52Z

references/detection/train_pytorch_DDP.py

+import datetime
+import hashlib
+import logging
+import multiprocessing 


import multiprocessing as mp

That's required for the worker count if not directly passed

references/detection/train_pytorch_DDP.py

felixdittrich92 · 2024-11-15T13:06:22Z

references/detection/train_pytorch_DDP.py

+
+    """
+        rank(int) : it is the unique identifier to each process and you can also say that it is your device id 
+        world_size(int) : total number of processes 


->

""" Args: ---- rank (int): device id to put the model on world_size (int): number of processes participating in the job args: other arguments passed through the CLI """

references/detection/train_pytorch_DDP.py

…DME file.

sarjil77 · 2024-11-19T08:10:19Z

all the changes as you suggested are done along with its updated README file as well

felixdittrich92

Hi @sarjil77 👋

Thanks for the updates only a few small things left and we are good to merge 👍

After changes are applied please run:

make style

or

pip3 install ruff
ruff format .
ruff check --fix .

felixdittrich92 · 2024-11-21T06:17:08Z

references/detection/train_pytorch_DDP.py

+import time
+import numpy as np
+import torch
+import wandb


missing import

import multiprocessing as mp

felixdittrich92 · 2024-11-21T06:18:55Z

references/detection/train_pytorch_DDP.py

+
+    """
+        Args:
+        ----


Please remove the

----

-->

""" Args: rank (int): device id to put the model on world_size (int): number of processes participating in the job args: other arguments passed through the CLI """ We changed this in the meanwhile :)

felixdittrich92 · 2024-11-21T06:19:43Z

references/detection/train_pytorch_DDP.py

+    with open(os.path.join(args.train_path, "labels.json"), "rb") as f:
+        train_hash = hashlib.sha256(f.read()).hexdigest()
+
+    if args.show_samples:


@sarjil77 if rank == 0 missing :)

sarjil77 · 2024-11-21T17:22:00Z

i have made all the changes a s required :), and i am not adding "import multiprocessing as mp" as mp bacause mp is already defined using "pytorch.multiprocessing as mp", i have taken this reference from DDP script of Recognition task and i think it is good to merge now. :)

codecov · 2024-11-21T18:19:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.55%. Comparing base (83f1bc5) to head (a522f0e).
Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1777      +/-   ##
==========================================
- Coverage   96.56%   96.55%   -0.02%     
==========================================
  Files         164      165       +1     
  Lines        7895     7901       +6     
==========================================
+ Hits         7624     7629       +5     
- Misses        271      272       +1

Flag	Coverage Δ
unittests	`96.55% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

felixdittrich92

Hi @sarjil77 👋,

Thanks for the updates 👍

2 small things left but don't worry i can do this thanks for the PR :)

adding pytorch DDP script of detection task

c16f860

felixdittrich92 requested changes Nov 15, 2024

View reviewed changes

felixdittrich92 self-assigned this Nov 15, 2024

felixdittrich92 added this to the 0.11.0 milestone Nov 15, 2024

felixdittrich92 added topic: documentation Improvements or additions to documentation type: enhancement Improvement ext: references Related to references folder framework: pytorch Related to PyTorch backend topic: text detection Related to the task of text detection labels Nov 15, 2024

added DDP script for detection task and updated its corresponding REA…

8046d9f

…DME file.

felixdittrich92 requested changes Nov 21, 2024

View reviewed changes

updated script for DDP

a522f0e

felixdittrich92 approved these changes Nov 22, 2024

View reviewed changes

felixdittrich92 merged commit c3ec3cb into mindee:main Nov 22, 2024
66 of 70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding pytorch DDP script of detection task #1777

adding pytorch DDP script of detection task #1777

sarjil77 commented Nov 15, 2024

felixdittrich92 left a comment •

edited

Loading

felixdittrich92 Nov 15, 2024

felixdittrich92 Nov 15, 2024

sarjil77 commented Nov 19, 2024

felixdittrich92 left a comment

felixdittrich92 Nov 21, 2024

felixdittrich92 Nov 21, 2024

felixdittrich92 Nov 21, 2024

sarjil77 commented Nov 21, 2024

codecov bot commented Nov 21, 2024 •

edited

Loading

felixdittrich92 left a comment

adding pytorch DDP script of detection task #1777

adding pytorch DDP script of detection task #1777

Conversation

sarjil77 commented Nov 15, 2024

felixdittrich92 left a comment • edited Loading

Choose a reason for hiding this comment

Multi-GPU support (PyTorch only - Experimental)

felixdittrich92 Nov 15, 2024

Choose a reason for hiding this comment

felixdittrich92 Nov 15, 2024

Choose a reason for hiding this comment

sarjil77 commented Nov 19, 2024

felixdittrich92 left a comment

Choose a reason for hiding this comment

felixdittrich92 Nov 21, 2024

Choose a reason for hiding this comment

felixdittrich92 Nov 21, 2024

Choose a reason for hiding this comment

felixdittrich92 Nov 21, 2024

Choose a reason for hiding this comment

sarjil77 commented Nov 21, 2024

codecov bot commented Nov 21, 2024 • edited Loading

Codecov Report

felixdittrich92 left a comment

Choose a reason for hiding this comment

felixdittrich92 left a comment •

edited

Loading

codecov bot commented Nov 21, 2024 •

edited

Loading