You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently working on a system for LLM inference where I have 16 AMD GPUs distributed evenly across 2 clusters.
My setup has cluster 1 (C1) running LLM inference, offloading the layers to local GPUs, and cluster 2 (C2) also receiving offloaded layers from C1 through some RPC servers running on each GPU on C2.
Now, with C2 running 8 servers (1 for each GPU) for C1 for to communicate with, the process is constantly running and waiting for C1 to send data to it.
Is there a way to trace the GPU performance of C1 and C2 when I run my LLM inference application? Since it's on 2 separate clusters, I'm assuming I'd need to run omnitrace on each cluster for a set period and let it listen to HIP/HSA events?
I'm thinking the trace time window example may be something I'm looking for. But I'm not sure if it's possible to incorporate my applications with this example.
I hope this makes sense, let me know if there's anything I can clarify further. Thank you!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey there,
I'm currently working on a system for LLM inference where I have 16 AMD GPUs distributed evenly across 2 clusters.
My setup has cluster 1 (C1) running LLM inference, offloading the layers to local GPUs, and cluster 2 (C2) also receiving offloaded layers from C1 through some RPC servers running on each GPU on C2.
Now, with C2 running 8 servers (1 for each GPU) for C1 for to communicate with, the process is constantly running and waiting for C1 to send data to it.
Is there a way to trace the GPU performance of C1 and C2 when I run my LLM inference application? Since it's on 2 separate clusters, I'm assuming I'd need to run omnitrace on each cluster for a set period and let it listen to HIP/HSA events?
I'm thinking the trace time window example may be something I'm looking for. But I'm not sure if it's possible to incorporate my applications with this example.
I hope this makes sense, let me know if there's anything I can clarify further. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions