Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling a multiline statement with Pytorch/CUDA usage results in very low time allocation #223

Open
fnobis opened this issue Aug 4, 2023 · 4 comments

Comments

@fnobis
Copy link

fnobis commented Aug 4, 2023

I run a multiline statment, calling onto a PyTorch network which runs in about 50ms. When measuring this with line_profiler, the measured time is strangely low. When putting the code in one line, the measurement seems to be ok.
In the multiline case, the number of hits is given as 2 even though the model and all other lines in the code are only run once.

Wrong time calculated, multiline

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   194         2          1.5      0.8      0.0              output_1, output_2 = self.model(
   195         2          0.7      0.3      0.0                  tensor_in, **parameters
   196                                                       )      

Correct time calculated and hits number, single line

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   194         1      50721.4  50721.4      4.5              output_1, output_2  = self.model(tensor_in, **parameters)
@Erotemic
Copy link
Member

Erotemic commented Aug 4, 2023

Strange, can you make a MWE that reproduces this with a simple torch model?

@Theelx
Copy link
Collaborator

Theelx commented Aug 4, 2023

That is indeed interesting. I was working with using line_profiler on PyTorch models a few weeks ago and I didn't notice this issue, but I could easily have been doing something wrong.

@Theelx
Copy link
Collaborator

Theelx commented Aug 4, 2023

Also, in addition to an MWE, can you give us your platform (Windows vs Mac vs Linux), Python version, and line_profiler version? This may be related to #210, which is fixed if you install from this git repo but hasn't been officially released yet. So, maybe running your code with the version of line_profiler in this repo could help.

@tmm1
Copy link

tmm1 commented Aug 21, 2023

You need to make sure to set CUDA_LAUNCH_BLOCKING=1 for accurate results, otherwise the cuda kernels are running async and all the time will accumulate at the wrong line whenever there's a cuda sync happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants