Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineModule inflated checkpoints when using FP16 param flattening #549

Open
opherlieber opened this issue Nov 22, 2020 · 0 comments
Open

Comments

@opherlieber
Copy link

Hi,

When using the pipeline module/engine, checkpoints are saved per layer within each module/stage. But when using one of the FP16 optimizers, each layers checkpoint size is the full size of the whole pipeline stage (all layers in the stage), which causes checkpoint sizes to be quadratic in the number of layers per stage.

The issue seems to be related to the FP16 optimizers using one large flattened param buffer for the whole param group, but pytorch saves the whole tensor even when only saving a view of it.

Currently, I can overcome this by passing per-layer param-groups to the optimizer:

params = [{"params": [p for p in layer.parameters() if p.requires_grad]} for layer in net.forward_funcs]

But this doesn't seem super clean. Is there a cleaner way I missed to handle this? And if not maybe worth adding a method to PipelineModule to retrieve the parameters in groups per layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant