Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce device memory usage for Synapses objects #87

Closed
4 tasks done
denisalevi opened this issue Mar 12, 2018 · 10 comments
Closed
4 tasks done

Reduce device memory usage for Synapses objects #87

denisalevi opened this issue Mar 12, 2018 · 10 comments

Comments

@denisalevi
Copy link
Member

denisalevi commented Mar 12, 2018

Variables to remove from device memory when not needed:

  • delay
  • lastupdate
  • i and j
  • using 32bit integer offsets instead of 64bit pointers for bundles

Currently for each Synapses(..., model='...', ...) object, we allocate device arrays for the Variables i (pre ID), j (post ID), lastupdate, delay (in seconds) and for each synaptic variable defined in the model keyword. The index of these arrays corresponds to the synapse ID. Now we don't always need all these arrays in device memory and could save a lot of memory for large network simulations (where the number of synapses easily results in reaching available memory limits):

delay [seconds]
We don't need the delay value on the device at all (see #83).

lastupdate
We only need lastupdate when we have a Synapses object with event-driven dynamics. Or if a user explicitly uses lastupdate in the model equations (which is a rare case I'd assume). For a Synapses object without synaptic model variables, lastupdate creates ~1/3 of its memory demands. After a quick look, it might be possible to have a check before adding the lastupdate variable here in Synapses.__init__() and modify SynapticPathway.update_abstract_code() in brian2 to get rid of lastupdate when its not needed. One would need to check for occurrence of lastupdate in the Synapses.model's SingleEquation.expr.identifiers and the on_pre, on_post, on_event code. And possibly additional run_regularly calls?

i and j
Synapses which don't modify or access presynaptic (postsynaptic) neuron variables, don't need access to i (j) during the run() call (and therefore we don't need it in device memory). E.g. for a typical Synapses(group, on_pre='v_post += const') that would be the case. i and j are still needed for constructing the connectivity matrix though, but we don't need to copy it to device memory (and we could also write to disk and delete them on the host as soon as they are not needed anymore, see #84).

@mstimberg Could you maybe take a look at this? Is there something that I'm missing here?

  1. I there a reason to keep lastupdate? And is it as easy to implement this check as I made it sound?
  2. And to find out, weather a Synapses object needs access to i / j, I would check for _presynaptic_idx / _postsynaptic_idx in all read and written variables in a synapses object (similar to what I'm doing here in the CUDAStandaloneDevice. Do you see an easier way of doing this?
@mstimberg
Copy link
Member

Hmm, yes, this all becomes a bit hackish... In general your ideas sound good, but it seems as if we keep adding special cases and little workarounds. I've long wanted to refactor the code generation process in a way that would make this much more straightforward, e.g. each CodeObject should expose which variables it reads/writes and with which indices (from templates and from user code). If this information was available, you could go through all code objects and get all the synaptic variables that are actually accessed to decide whether you need to copy them to device memory. Unfortunately I don't think this refactoring will happen very soon, so for now I guess we have to do with these ad-hoc solutions.

@denisalevi
Copy link
Member Author

This issue should also include using 32bit integers as offsets to the synapseID array instead of pointers (64bit on most relevant computers) for each bundle. This can reduce memory usage by up to 20% if there are many small bundles. The same could be done in synapses push mode, but the memory reduction would be probably negligible.

@moritzaugustin
Copy link
Contributor

There is a bug with the poinster scheme that needs to be fixed

@moritzaugustin moritzaugustin added this to the Sprint July milestone Jul 9, 2018
@denisalevi
Copy link
Member Author

There is an implementation of using offsets instead of pointers in a local brunch (has a bug currently). When checking that, also finish the comments and explanations in the synapse init template.

@denisalevi
Copy link
Member Author

Update:

  • Not using lastupdate when not needed is implemented. At least good enough for our benchmarks to work. There are some cases where it would produce a bug though, didn't look into it. But opened a brian2 PR Don't create lastupdate variables if they are not needed brian2#979 (its all changes in brian right now).
  • I've been getting closer to the bug in in the "offset instead of pointer" branch, but couldn't solve it yet. But mainly because the time was not enough, not because I got stuck. Will have to wrap it up now and revisit if we have more time later I guess... or spend more time on it
  • I didn't look into i and j yet (just very quickl). But it looks quite easy, might be doable in an hour.

@moritzaugustin
Copy link
Contributor

idea how many more hours than expected you would need to solve the bug?

i_j in one hour sounds reasonable!

@denisalevi
Copy link
Member Author

I narrowed the bug down functionally a bit. It seems that synaptic effects applied to synaptic variables from synapses with different delays onto the same post neuron are not applied correctly, i.e. synapses in different bundles. But synapses with the same delay from different pre neurons to the same post neuron (meaning within a bundle) are applied correctly. I haven't looked into the CUDA side yet. I feel like this could be done in 2hrs. Or take 4. Or blow up (less likely). I would suggest to spent another 2 hrs on it and reevaluate then? I have already spent the 8 hrs estimated for this issue, so any additional work now is extra.
And I would leave the i, j implementation for later (different code part, no synergy effect on working on it now). Would prefer to to the atomics first and see how long that takes.
@moritzaugustin what do you think? I will continue working on it towards the evening.

@moritzaugustin
Copy link
Contributor

I am fine with your suggestion (soft limiting to two hours; if good progressuntil then 4 hours hard limit)

@denisalevi
Copy link
Member Author

denisalevi commented Jul 15, 2018

All points in this issue are fixed by now. It took me ~2hrs to find the bug. But with testing and cleanup etc it was 4hrs. And I ended up doing the i/j implementation, which took 2 hrs. So this took 6 hrs over the estimate.

I realized there are two more variables that we don't always need on the device. Per synapse we also have N_incoming and N_outgoing. And those are only needed, when they are referenced in synapses equations or updates, which I assume is rather rare. I'm opened another issue for that (#128). And see discussion in brian-team/brian2#979.

@moritzaugustin
Copy link
Contributor

great that you fixed the bug and implemented i/j

for the rest let's see how brian progresses with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants