Is it possible to use gradient accumulation to counter small GPU memory? #203

ChrisJWest · 2022-08-11T22:09:02Z

Hi! I had a quick question I was wondering if I could pick your brain about:

I'm using SimCLR for very high-dimensionality data (such that I max out at batch size 4). Clearly, it really isn't feasible to run SimCLR since the batch size is so low. I was thinking about trying to use some sort of gradient accumulation technique, but my concern is that it might not quite mesh well with how the loss function works. Let's say I want to use an effective batch size of 64 (with minibatch size 4). Since we are essentially computing the dot product of the projections, instead of computing the dot product between 64 pairs like it would be in normal SimCLR, it would be like computing the averaged dot product of 8 instances of 4 pairs, and then updating the gradient. I'm not confident that this will have the same effect as a large batch size since the loss itself is reliant on comparing the single positive sample to a large number of negative samples. Do you think there is a way that I can modify this framework to simulate large batch sizes with these types of memory constraints? Or if there is a way I can get gradient accumulation to work the way I want?

chentingpc · 2022-08-13T00:47:32Z

this should be possible, though i haven't tried anything like that. see https://arxiv.org/pdf/2111.10050.pdf

ChrisJWest · 2022-08-15T19:24:18Z

Great paper, thanks for the link. Yes, I tried running some experiments and standard GradAcc did not do particularly well with a minibatch size this low (4). I like the ideas in the paper with rematerialization though, I might see if I can try something like that.

chentingpc · 2022-08-15T23:45:12Z

feel free to share your rep here if you made it work eventually!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use gradient accumulation to counter small GPU memory? #203

Is it possible to use gradient accumulation to counter small GPU memory? #203

ChrisJWest commented Aug 11, 2022

chentingpc commented Aug 13, 2022

ChrisJWest commented Aug 15, 2022

chentingpc commented Aug 15, 2022

Is it possible to use gradient accumulation to counter small GPU memory? #203

Is it possible to use gradient accumulation to counter small GPU memory? #203

Comments

ChrisJWest commented Aug 11, 2022

chentingpc commented Aug 13, 2022

ChrisJWest commented Aug 15, 2022

chentingpc commented Aug 15, 2022