PipelineModule inflated checkpoints when using FP16 param flattening #549

opherlieber · 2020-11-22T10:25:32Z

Hi,

When using the pipeline module/engine, checkpoints are saved per layer within each module/stage. But when using one of the FP16 optimizers, each layers checkpoint size is the full size of the whole pipeline stage (all layers in the stage), which causes checkpoint sizes to be quadratic in the number of layers per stage.

The issue seems to be related to the FP16 optimizers using one large flattened param buffer for the whole param group, but pytorch saves the whole tensor even when only saving a view of it.

Currently, I can overcome this by passing per-layer param-groups to the optimizer:

params = [{"params": [p for p in layer.parameters() if p.requires_grad]} for layer in net.forward_funcs]

But this doesn't seem super clean. Is there a cleaner way I missed to handle this? And if not maybe worth adding a method to PipelineModule to retrieve the parameters in groups per layer?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PipelineModule inflated checkpoints when using FP16 param flattening #549

PipelineModule inflated checkpoints when using FP16 param flattening #549

opherlieber commented Nov 22, 2020

PipelineModule inflated checkpoints when using FP16 param flattening #549

PipelineModule inflated checkpoints when using FP16 param flattening #549

Comments

opherlieber commented Nov 22, 2020