Notes during memory use reduction #5

xloem · 2022-12-19T14:39:41Z

Hi,

For fun, kind of, I'm poking at reducing the memory usage of the standalone example, so more of it can run on my lower end system.

I've only skimmed the paper so far, but while looking at the code I'm encountering some small confusions or questions regarding the implementation, so I'm opening this pull request to connect a little bit.

I'll answer these questions myself when and if I find the answers.

Questions:

I noticed the use of the kernel norm is detached from the gradient graph and cached between runs. How come this doesn't result in a disparate kernel norm as the kernels are updating during training?
I noticed the use of F.interpolate does not specify align_corners, leaving it to default to False, which it looks like to me can leave some flat artefacts at the edges, and stretch the content between them by a subsample, when the interpolation is linear. Does this matter? My intuition would have been to do linear interpolation by dropping the last sample or wrapping to the first. In my changes, I had to add a constant of 1 to the input size to get the same interpolation output for truncated kernels.
Why is it helpful to scale the weights of the kernels by their distance? Wouldn't the training process learn this scaling itself?
Small changes to the backend can result in small (on the order of 1/100th) changes to outputs unless a lot of care is taken. How important is that kind of numerical stability?

Resolved:

I understand that n=2*L in the fft is to avoid performing the circular convolution. (this took me some learning)

Idea:
Given the principles of this algorithm, it looks to me like it might be possible to run using very minimal ram, by using the fft to perform the kernel interpolation in frequency space, and streaming the convolution.

xloem · 2022-12-20T01:33:16Z

This is what I got done today. I'm kind of a crazy flake, so who knows what tomorrow holds, but here are some changes for less ram usage if they are of interest.

John Doe added 2 commits December 19, 2022 09:26

draft part: reduce use of temporaries when constructing kernel

03de26b

add tiled convolution

e409501

xloem marked this pull request as ready for review December 20, 2022 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes during memory use reduction #5

Notes during memory use reduction #5

xloem commented Dec 19, 2022 •

edited

Loading

xloem commented Dec 20, 2022

Notes during memory use reduction #5

Are you sure you want to change the base?

Notes during memory use reduction #5

Conversation

xloem commented Dec 19, 2022 • edited Loading

xloem commented Dec 20, 2022

xloem commented Dec 19, 2022 •

edited

Loading