Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Is it possible to use gradient accumulation to counter small GPU memory? #203

Open
ChrisJWest opened this issue Aug 11, 2022 · 3 comments
Open

Comments

@ChrisJWest
Copy link

Hi! I had a quick question I was wondering if I could pick your brain about:

I'm using SimCLR for very high-dimensionality data (such that I max out at batch size 4). Clearly, it really isn't feasible to run SimCLR since the batch size is so low. I was thinking about trying to use some sort of gradient accumulation technique, but my concern is that it might not quite mesh well with how the loss function works. Let's say I want to use an effective batch size of 64 (with minibatch size 4). Since we are essentially computing the dot product of the projections, instead of computing the dot product between 64 pairs like it would be in normal SimCLR, it would be like computing the averaged dot product of 8 instances of 4 pairs, and then updating the gradient. I'm not confident that this will have the same effect as a large batch size since the loss itself is reliant on comparing the single positive sample to a large number of negative samples. Do you think there is a way that I can modify this framework to simulate large batch sizes with these types of memory constraints? Or if there is a way I can get gradient accumulation to work the way I want?

@chentingpc
Copy link
Contributor

this should be possible, though i haven't tried anything like that. see https://arxiv.org/pdf/2111.10050.pdf

@ChrisJWest
Copy link
Author

Great paper, thanks for the link. Yes, I tried running some experiments and standard GradAcc did not do particularly well with a minibatch size this low (4). I like the ideas in the paper with rematerialization though, I might see if I can try something like that.

@chentingpc
Copy link
Contributor

feel free to share your rep here if you made it work eventually!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants