Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Roctracer GPU Events Have Overlapping Intervals #104

Open
sraikund16 opened this issue Sep 19, 2024 · 6 comments
Open

[Issue]: Roctracer GPU Events Have Overlapping Intervals #104

sraikund16 opened this issue Sep 19, 2024 · 6 comments

Comments

@sraikund16
Copy link

sraikund16 commented Sep 19, 2024

Problem Description

When running a very small Resnet50 model, I am seeing that GPU events on a single track (stream/queue) have events with overlapping time intervals. I see these issues commonly in very specific kernels such as MIOpenBatchNormBwdSpatial and batched_transpose_32x32_dword which have kind=0x11F0 and op=0. To investigate further, I created a debug branch here to see what the output of roctracer (before kineto does any processing) was returning: https://github.com/pytorch/kineto/pull/990/files

In this branch I have a debug that triggers several messages similar to the following:
Out of order activity: 1886121463888334 < 1886121463888361. Difference: 27 ns. Kernel: batched_transpose_32x32_dword last Kernel: MIOpenBatchNormFwdTrainSpatialNorml
which suggests that there is interval overlapping. In this branch I am only check for overlapping events for non-unknown kind events but there are also many overlappings there as well.

Thanks!

Operating System

CentOS Stream 9

CPU

AMD EPYC 7713

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

roctracer

Steps to Reproduce

Run model with the kernels specified above and observe if they overlap or not

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@sraikund16
Copy link
Author

Since the overlap is so small I am thinking that there could be possibly some rounding issue that is going on?

@sraikund16
Copy link
Author

Here is another print with the queue ids outputted:
Out of order activity: 1895910188521077 < 1895910188521125. Difference: 48 ns. Kernel: batched_transpose_16x32_dword last Kernel: batched_transpose_16x32_dword Queue: 0 last Queue: 0

@ppanchad-amd
Copy link

Hi @sraikund16. Internal ticket has been created to investigate your issue. Thanks!

@darren-amd
Copy link

Hi @sraikund16,

I was not able to built your branch locally on an 7900 XTX, could you let me know what build steps you are following (including environment variables you have set) as well as how you are running your example? This should help me reproduce the issue to help further, thanks!

@sraikund16
Copy link
Author

Hello, you can build off of main on PyTorch and run a basic training job to reproduce this issue. My branch just adds debug to the output of roctracer to show that there are overlapping intervals from the raw output of roctracer. As mentioned in the description of this post. I found that certain events appear to overlap more frequently than others so it might be best to induce those. Thanks!

@darren-amd
Copy link

Hi @sraikund16,

This appears to be a similar issue to #105, which we are currently working towards a fix for, please let me know if you have any concerns, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants