[Question] Combining sync batchnorm with DeepSpeed? #502

Parskatt · 2020-11-04T10:55:16Z

Hi,

I'm not sure how to combine synchronized batchnormalization with deepspeed.
Using model_engine.module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model_engine.module)
Gives me the error: AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

Is there another way of making my batchnorms synchronized?

The text was updated successfully, but these errors were encountered:

Parskatt · 2020-11-06T13:23:16Z

I solved it (I think) by just requiring dist_init_required=True in the initialization step.

Parskatt · 2020-11-09T11:32:16Z

Nevermind, that was simply because I was using a single GPU, its still not working for multiple GPUs for me.

Parskatt · 2023-01-04T18:35:07Z

Not sure what I was doing when I typed this, but nowadays it works by doing

model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model_engine, _, _, _ = deepspeed.initialize(model = model, ...)

Parskatt closed this as completed Nov 6, 2020

Parskatt reopened this Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Combining sync batchnorm with DeepSpeed? #502

[Question] Combining sync batchnorm with DeepSpeed? #502

Parskatt commented Nov 4, 2020

Parskatt commented Nov 6, 2020

Parskatt commented Nov 9, 2020

Parskatt commented Jan 4, 2023

[Question] Combining sync batchnorm with DeepSpeed? #502

[Question] Combining sync batchnorm with DeepSpeed? #502

Comments

Parskatt commented Nov 4, 2020

Parskatt commented Nov 6, 2020

Parskatt commented Nov 9, 2020

Parskatt commented Jan 4, 2023