Training time #244

asudeeaydin · 2022-07-03T13:34:33Z

asudeeaydin
Jul 3, 2022

Hi, I'm training a network using your framework and the training time is extremely slow. I'm wondering whether there is a problem on my side or if this is expected.

Here are the details.
My network has more or less 10 convolutional layers followed by a LIF activation layer and a batch normalization layer.
The input to the network is of dimension (8, 2, 256, 256).
I'm using the step-by-step mode, so in each training sample, I have 10 timestep bins which I provide to the network in a for loop.
I'm training with BPTT without truncation.
I have 81875 samples in my training set.

The training for a single epoch takes slightly more than 6 hours.

In comparison, a conventional CNN takes less than an hour for a network of similar architecture with 20 channels (channels*timebins), with standard BP, same resolution, same dataset size, and same batch size.

The difference seems too drastic so I'm wondering if this is expected, or is it an issue from my side?

Additionally, when I analyzed the time with the Pytorch profiler I saw the following:
The average time duration for a batch of size 8 is 2,260,800 us.

The 3 most time consuming operations were as follows:

aten::cudnn_convolution_backward_weight at 882,443 us
aten::cudnn_convolution_backward_input at 485,788 us
aten::cudnn_convolution at 362,311 us

From these results I'm seeing most of the computation time is spent during BP, so it can't be a faulty data loader implementation.

Answered by asudeeaydin

Sep 5, 2022

Sorry for the delayed response. I discovered the slow running time was mainly due to the high resolution of the dataset.

View full answer

asudeeaydin · 2022-07-03T13:38:22Z

asudeeaydin
Jul 3, 2022
Author

I've also tried implementing the layer-by-layer mode as it should be quicker, however since the GPU utilization is extremely high for both cases it didn't show an improvement.

Additionally, I'm using the spikingjelly = 0.0.0.0.6 version.

0 replies

fangwei123456 · 2022-07-04T01:17:10Z

fangwei123456
Jul 4, 2022
Maintainer

Hi, can you show me the minimal codes of the network defining, data loading and training loop?

0 replies

asudeeaydin · 2022-09-05T11:44:30Z

asudeeaydin
Sep 5, 2022
Author

Sorry for the delayed response. I discovered the slow running time was mainly due to the high resolution of the dataset.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time #244

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Training time #244

asudeeaydin Jul 3, 2022

Replies: 3 comments

asudeeaydin Jul 3, 2022 Author

fangwei123456 Jul 4, 2022 Maintainer

asudeeaydin Sep 5, 2022 Author

asudeeaydin
Jul 3, 2022

asudeeaydin
Jul 3, 2022
Author

fangwei123456
Jul 4, 2022
Maintainer

asudeeaydin
Sep 5, 2022
Author