Training time #244
-
Hi, I'm training a network using your framework and the training time is extremely slow. I'm wondering whether there is a problem on my side or if this is expected. Here are the details. The training for a single epoch takes slightly more than 6 hours. In comparison, a conventional CNN takes less than an hour for a network of similar architecture with 20 channels (channels*timebins), with standard BP, same resolution, same dataset size, and same batch size. The difference seems too drastic so I'm wondering if this is expected, or is it an issue from my side? Additionally, when I analyzed the time with the Pytorch profiler I saw the following: The 3 most time consuming operations were as follows:
From these results I'm seeing most of the computation time is spent during BP, so it can't be a faulty data loader implementation. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I've also tried implementing the layer-by-layer mode as it should be quicker, however since the GPU utilization is extremely high for both cases it didn't show an improvement. Additionally, I'm using the spikingjelly = 0.0.0.0.6 version. |
Beta Was this translation helpful? Give feedback.
-
Hi, can you show me the minimal codes of the network defining, data loading and training loop? |
Beta Was this translation helpful? Give feedback.
-
Sorry for the delayed response. I discovered the slow running time was mainly due to the high resolution of the dataset. |
Beta Was this translation helpful? Give feedback.
Sorry for the delayed response. I discovered the slow running time was mainly due to the high resolution of the dataset.