Reduce device memory usage for `Synapses` objects #87

denisalevi · 2018-03-12T22:25:49Z

Variables to remove from device memory when not needed:

delay
lastupdate
i and j
using 32bit integer offsets instead of 64bit pointers for bundles

Currently for each Synapses(..., model='...', ...) object, we allocate device arrays for the Variables i (pre ID), j (post ID), lastupdate, delay (in seconds) and for each synaptic variable defined in the model keyword. The index of these arrays corresponds to the synapse ID. Now we don't always need all these arrays in device memory and could save a lot of memory for large network simulations (where the number of synapses easily results in reaching available memory limits):

delay [seconds]
We don't need the delay value on the device at all (see #83).

lastupdate
We only need lastupdate when we have a Synapses object with event-driven dynamics. Or if a user explicitly uses lastupdate in the model equations (which is a rare case I'd assume). For a Synapses object without synaptic model variables, lastupdate creates ~1/3 of its memory demands. After a quick look, it might be possible to have a check before adding the lastupdate variable here in Synapses.__init__() and modify SynapticPathway.update_abstract_code() in brian2 to get rid of lastupdate when its not needed. One would need to check for occurrence of lastupdate in the Synapses.model's SingleEquation.expr.identifiers and the on_pre, on_post, on_event code. And possibly additional run_regularly calls?

i and j
Synapses which don't modify or access presynaptic (postsynaptic) neuron variables, don't need access to i (j) during the run() call (and therefore we don't need it in device memory). E.g. for a typical Synapses(group, on_pre='v_post += const') that would be the case. i and j are still needed for constructing the connectivity matrix though, but we don't need to copy it to device memory (and we could also write to disk and delete them on the host as soon as they are not needed anymore, see #84).

@mstimberg Could you maybe take a look at this? Is there something that I'm missing here?

I there a reason to keep lastupdate? And is it as easy to implement this check as I made it sound?
And to find out, weather a Synapses object needs access to i / j, I would check for _presynaptic_idx / _postsynaptic_idx in all read and written variables in a synapses object (similar to what I'm doing here in the CUDAStandaloneDevice. Do you see an easier way of doing this?

The text was updated successfully, but these errors were encountered:

mstimberg · 2018-03-13T13:51:49Z

Hmm, yes, this all becomes a bit hackish... In general your ideas sound good, but it seems as if we keep adding special cases and little workarounds. I've long wanted to refactor the code generation process in a way that would make this much more straightforward, e.g. each CodeObject should expose which variables it reads/writes and with which indices (from templates and from user code). If this information was available, you could go through all code objects and get all the synaptic variables that are actually accessed to decide whether you need to copy them to device memory. Unfortunately I don't think this refactoring will happen very soon, so for now I guess we have to do with these ad-hoc solutions.

denisalevi · 2018-07-05T15:54:11Z

This issue should also include using 32bit integers as offsets to the synapseID array instead of pointers (64bit on most relevant computers) for each bundle. This can reduce memory usage by up to 20% if there are many small bundles. The same could be done in synapses push mode, but the memory reduction would be probably negligible.

moritzaugustin · 2018-07-09T15:29:28Z

There is a bug with the poinster scheme that needs to be fixed

denisalevi · 2018-07-10T20:21:05Z

There is an implementation of using offsets instead of pointers in a local brunch (has a bug currently). When checking that, also finish the comments and explanations in the synapse init template.

denisalevi · 2018-07-11T19:28:07Z

Update:

Not using lastupdate when not needed is implemented. At least good enough for our benchmarks to work. There are some cases where it would produce a bug though, didn't look into it. But opened a brian2 PR Don't create lastupdate variables if they are not needed brian2#979 (its all changes in brian right now).
I've been getting closer to the bug in in the "offset instead of pointer" branch, but couldn't solve it yet. But mainly because the time was not enough, not because I got stuck. Will have to wrap it up now and revisit if we have more time later I guess... or spend more time on it
I didn't look into i and j yet (just very quickl). But it looks quite easy, might be doable in an hour.

moritzaugustin · 2018-07-11T19:33:27Z

idea how many more hours than expected you would need to solve the bug?

i_j in one hour sounds reasonable!

denisalevi · 2018-07-12T10:06:49Z

I narrowed the bug down functionally a bit. It seems that synaptic effects applied to synaptic variables from synapses with different delays onto the same post neuron are not applied correctly, i.e. synapses in different bundles. But synapses with the same delay from different pre neurons to the same post neuron (meaning within a bundle) are applied correctly. I haven't looked into the CUDA side yet. I feel like this could be done in 2hrs. Or take 4. Or blow up (less likely). I would suggest to spent another 2 hrs on it and reevaluate then? I have already spent the 8 hrs estimated for this issue, so any additional work now is extra.
And I would leave the i, j implementation for later (different code part, no synergy effect on working on it now). Would prefer to to the atomics first and see how long that takes.
@moritzaugustin what do you think? I will continue working on it towards the evening.

moritzaugustin · 2018-07-12T16:15:46Z

I am fine with your suggestion (soft limiting to two hours; if good progressuntil then 4 hours hard limit)

denisalevi · 2018-07-15T21:02:48Z

All points in this issue are fixed by now. It took me ~2hrs to find the bug. But with testing and cleanup etc it was 4hrs. And I ended up doing the i/j implementation, which took 2 hrs. So this took 6 hrs over the estimate.

I realized there are two more variables that we don't always need on the device. Per synapse we also have N_incoming and N_outgoing. And those are only needed, when they are referenced in synapses equations or updates, which I assume is rather rare. I'm opened another issue for that (#128). And see discussion in brian-team/brian2#979.

moritzaugustin · 2018-07-16T14:15:35Z

great that you fixed the bug and implemented i/j

for the rest let's see how brian progresses with that

denisalevi added optimisation component: synapses needs brian2 change labels Mar 12, 2018

denisalevi mentioned this issue Mar 17, 2018

Set delay values (in seconds) only on host #83

Closed

moritzaugustin added this to the Sprint July milestone Jul 9, 2018

denisalevi mentioned this issue Jul 14, 2018

Refactor synapses code #127

Merged

denisalevi closed this as completed Jul 15, 2018

denisalevi mentioned this issue Jul 15, 2018

Don't copy N_incoming / N_outgoing synapse variables to device if not needed #128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce device memory usage for `Synapses` objects #87

Reduce device memory usage for `Synapses` objects #87

denisalevi commented Mar 12, 2018 •

edited

Loading

mstimberg commented Mar 13, 2018

denisalevi commented Jul 5, 2018

moritzaugustin commented Jul 9, 2018

denisalevi commented Jul 10, 2018

denisalevi commented Jul 11, 2018

moritzaugustin commented Jul 11, 2018

denisalevi commented Jul 12, 2018

moritzaugustin commented Jul 12, 2018

denisalevi commented Jul 15, 2018 •

edited

Loading

moritzaugustin commented Jul 16, 2018

Reduce device memory usage for Synapses objects #87

Reduce device memory usage for Synapses objects #87

Comments

denisalevi commented Mar 12, 2018 • edited Loading

mstimberg commented Mar 13, 2018

denisalevi commented Jul 5, 2018

moritzaugustin commented Jul 9, 2018

denisalevi commented Jul 10, 2018

denisalevi commented Jul 11, 2018

moritzaugustin commented Jul 11, 2018

denisalevi commented Jul 12, 2018

moritzaugustin commented Jul 12, 2018

denisalevi commented Jul 15, 2018 • edited Loading

moritzaugustin commented Jul 16, 2018

Reduce device memory usage for `Synapses` objects #87

Reduce device memory usage for `Synapses` objects #87

denisalevi commented Mar 12, 2018 •

edited

Loading

denisalevi commented Jul 15, 2018 •

edited

Loading