Skip to content

Latest commit

 

History

History
223 lines (213 loc) · 21.6 KB

Intel_PC.md

File metadata and controls

223 lines (213 loc) · 21.6 KB

Intel GPU Performance Counters

References

Notes

3D Pipe - Contains: Input Assembler, VS, HS, DS, GS, Clipper, Rasterizer, PS, Output Merger.
Color Cache -
Command Streamer -
Depth Cache -
EU -
FPU -
FPU0 pipe -
FPU1 pipe -
GTI -
IC -
Output Merger - blending?
Resource Streamer -
Subslice -
Send pipe -
Stream Output -
Stencil Cache -
Thread Dispatcher -

UHD 620 (VK_KHR_performance_query)

name category description
GpuTime GPU Time elapsed on the GPU during the measurement.
GpuCoreClocks GPU The total number of GPU core clocks elapsed during the measurement.
AvgGpuCoreFrequencyMHz GPU Average GPU Core Frequency in the measurement.
VsThreads EU Array/Vertex Shader The total number of vertex shader hardware threads dispatched.
HsThreads EU Array/Hull Shader The total number of hull shader hardware threads dispatched.
DsThreads EU Array/Domain Shader The total number of domain shader hardware threads dispatched.
GsThreads EU Array/Geometry Shader The total number of geometry shader hardware threads dispatched.
PsThreads EU Array/Pixel Shader The total number of pixel shader hardware threads dispatched.
CsThreads EU Array/Compute Shader The total number of compute shader hardware threads dispatched.
GpuBusy GPU The percentage of time in which the GPU has been processing GPU commands.
EuActive EU Array The percentage of time in which the Execution Units were actively processing.
EuStall EU Array The percentage of time in which the Execution Units were stalled.
EuFpuBothActive EU Array/Pipes The percentage of time in which both EU FPU pipelines were actively processing.
VsFpu0Active EU Array/Vertex Shader The percentage of time in which EU FPU0 pipeline was actively processing a vertex shader instruction.
VsFpu1Active EU Array/Vertex Shader The percentage of time in which EU FPU1 pipeline was actively processing a vertex shader instruction.
VsSendActive EU Array/Vertex Shader The percentage of time in which EU send pipeline was actively processing a vertex shader instruction.
PsFpu0Active EU Array/Pixel Shader The percentage of time in which EU FPU0 pipeline was actively processing a pixel shader instruction.
PsFpu1Active EU Array/Pixel Shader The percentage of time in which EU FPU1 pipeline was actively processing a pixel shader instruction.
PsSendActive EU Array/Pixel Shader The percentage of time in which EU send pipeline was actively processing a pixel shader instruction.
PsEuBothFpuActive 3D Pipe/Pixel Shader The percentage of time in which pixel shaders were processed actively on the both FPUs.
Sampler0Busy Sampler The percentage of time in which Sampler 0 has been processing EU requests.
Sampler1Busy Sampler The percentage of time in which Sampler 1 has been processing EU requests.
SamplersBusy Sampler The percentage of time in which samplers have been processing EU requests.
Sampler0Bottleneck Sampler The percentage of time in which Sampler 0 has been slowing down the pipe when processing EU requests.
Sampler1Bottleneck Sampler The percentage of time in which Sampler 1 has been slowing down the pipe when processing EU requests.
RasterizedPixels 3D Pipe/Rasterizer The total number of rasterized pixels.
HiDepthTestFails 3D Pipe/Rasterizer/Hi-Depth Test The total number of pixels dropped on early hierarchical depth test.
EarlyDepthTestFails 3D Pipe/Rasterizer/Early Depth Test The total number of pixels dropped on early depth test.
SamplesKilledInPs 3D Pipe/Pixel Shader The total number of samples or pixels dropped in pixel shaders.
PixelsFailingPostPsTests 3D Pipe/Output Merger The total number of pixels dropped on post-PS alpha, stencil, or depth tests.
SamplesWritten 3D Pipe/Output Merger The total number of samples or pixels written to all render targets.
SamplesBlended 3D Pipe/Output Merger The total number of blended samples or pixels written to all render targets.
SamplerTexels Sampler/Sampler Input The total number of texels seen on input (with 2x2 accuracy) in all sampler units.
SamplerTexelMisses Sampler/Sampler Cache The total number of texels lookups (with 2x2 accuracy) that missed L1 sampler cache.
SamplerL1Misses Sampler/Sampler Cache The total number of sampler cache misses in all LODs in all sampler units.
SlmBytesRead L3/Data Port/SLM The total number of GPU memory bytes read from shared local memory.
SlmBytesWritten L3/Data Port/SLM The total number of GPU memory bytes written into shared local memory.
ShaderMemoryAccesses L3/Data Port The total number of shader memory accesses to L3.
ShaderAtomics L3/Data Port/Atomics The total number of shader atomic memory accesses.
L3Lookups L3/TAG The total number of L3 cache lookup accesses w/o IC.
L3Misses L3/TAG The total number of L3 misses.
L3SamplerThroughput L3/Sampler The total number of GPU memory bytes transferred between samplers and L3 caches.
L3ShaderThroughput L3/Data Port The total number of GPU memory bytes transferred between shaders and L3 caches w/o URB.
ShaderBarriers EU Array/Barrier The total number of shader barrier messages.
GtiVfThroughput GTI/3D Pipe The total number of GPU memory bytes transferred between 3D Pipeline (Command Dispatch, Input Assembly and Stream Output) and GTI.
GtiDepthThroughput GTI/Depth Cache The total number of GPU memory bytes transferred between depth caches and GTI.
GtiRccThroughput GTI/Color Cache The total number of GPU memory bytes transferred between render color caches and GTI.
GtiL3Throughput GTI/L3 The total number of GPU memory bytes transferred between L3 caches and GTI.
GtiHdcLookupsThroughput GTI/L3 The total number of GPU memory bytes transferred between GTI and HDC, when HDC is doing TLB lookups.
GtiReadThroughput GTI The total number of GPU memory bytes read from GTI.
GtiWriteThroughput GTI The total number of GPU memory bytes written to GTI.
SamplerBottleneck Sampler The percentage of time in which samplers have been slowing down the pipe when processing EU requests.
Fpu0Active EU Array/Pipes The percentage of time in which EU FPU0 pipeline was actively processing.
Fpu1Active EU Array/Pipes The percentage of time in which EU FPU1 pipeline was actively processing.
EuAvgIpcRate EU Array The average rate of IPC calculated for 2 FPU pipelines.
EuSendActive EU Array/Pipes The percentage of time in which EU send pipeline was actively processing.
EuThreadOccupancy EU Array The percentage of time in which hardware threads occupied EUs.
TypedBytesRead L3/Data Port The total number of typed memory bytes read via Data Port.
TypedBytesWritten L3/Data Port The total number of untyped memory bytes written via Data Port.
UntypedBytesRead L3/Data Port The total number of typed memory bytes read via Data Port.
UntypedBytesWritten L3/Data Port The total number of untyped memory bytes written via Data Port.
VfBottleneck 3D Pipe/Input Assembler The percentage of time in which vertex fetch pipeline stage was slowing down the 3D pipeline.
VsBottleneck 3D Pipe/Vertex Shader The percentage of time in which vertex shader pipeline stage was slowing down the 3D pipeline.
HsBottleneck 3D Pipe/Hull Shader The percentage of time in which hull shader pipeline stage was slowing down the 3D pipeline.
DsBottleneck 3D Pipe/Domain Shader The percentage of time in which domain shader pipeline stage was slowing down the 3D pipeline.
GsBottleneck 3D Pipe/Geometry Shader The percentage of time in which geometry shader pipeline stage was slowing down the 3D pipeline.
SoBottleneck 3D Pipe/Stream Output The percentage of time in which stream output pipeline stage was slowing down the 3D pipeline.
ClBottleneck 3D Pipe/Clipper The percentage of time in which clipper pipeline stage was slowing down the 3D pipeline.
SfBottleneck 3D Pipe/Rasterizer/Strip-Fans The percentage of time in which strip-fans pipeline stage was slowing down the 3D pipeline.
HiDepthBottleneck 3D Pipe/Rasterizer/Hi-Depth Test The percentage of time in which early hierarchical depth test pipeline stage was slowing down the 3D pipeline.
EarlyDepthBottleneck 3D Pipe/Rasterizer/Early Depth Test The percentage of time in which early depth test pipeline stage was slowing down the 3D pipeline.
BcBottleneck 3D Pipe/Rasterizer/Barycentric Calc The percentage of time in which barycentric coordinates calculation pipeline stage was slowing down the 3D pipeline.
HsStall 3D Pipe/Hull Shader The percentage of time in which hull stall pipeline stage was stalled.
DsStall 3D Pipe/Domain Shader The percentage of time in which domain shader pipeline stage was stalled.
SoStall 3D Pipe/Stream Output The percentage of time in which stream-output pipeline stage was stalled.
ClStall 3D Pipe/Clipper The percentage of time in which clipper pipeline stage was stalled.
SfStall 3D Pipe/Rasterizer/Strip-Fans The percentage of time in which strip-fans pipeline stage was stalled.
GtiCmdStreamerMemoryReads GTI/3D Pipe/Command Streamer The total number of GTI memory reads from Command Streamer.
GtiRsMemoryReads GTI/3D Pipe/Resource Streamer The total number of GTI memory reads from Resource Streamer.
GtiVfMemoryReads GTI/3D Pipe/Vertex Fetch The total number of GTI memory reads from Vertex Fetch.
GtiRccMemoryReads GTI/Color Cache The total number of GTI memory reads from Render Color Cache (Render Color Cache misses).
GtiMscMemoryReads GTI/Color Cache The total number of GTI memory reads from Multisampling Color Cache (Multisampling Color Cache misses).
GtiHizMemoryReads GTI/Depth Cache The total number of GTI memory reads from Hierarchical Depth Cache (Hi-Depth Cache misses).
GtiStcMemoryReads GTI/Depth Cache The total number of GTI memory reads from Stencil Cache (Stencil Cache misses).
GtiRczMemoryReads GTI/Depth Cache The total number of GTI memory reads from Render Depth Cache (Render Depth Cache misses).
GtiMemoryReads GTI The total number of GTI memory reads.
GtiL3Bank0Reads GTI/L3 The total number of GTI memory reads from L3 Bank 0 (L3 Cache misses).
GtiL3Bank1Reads GTI/L3 The total number of GTI memory reads from L3 Bank 1 (L3 Cache misses).
GtiL3Bank2Reads GTI/L3 The total number of GTI memory reads from L3 Bank 2 (L3 Cache misses).
GtiL3Bank3Reads GTI/L3 The total number of GTI memory reads from L3 Bank 3 (L3 Cache misses).
GtiL3Reads GTI/L3 The total number of GTI memory reads from L3 (L3 Cache misses).
GtiRingAccesses GTI The total number of all accesses from GTI to the ring.
GtiCmdStreamerMemoryWrites GTI/3D Pipe/Command Streamer The total number of GTI memory writes from Command Streamer.
GtiSoMemoryWrites GTI/3D Pipe/Stream Output The total number of GTI memory writes from Stream Output.
GtiRccMemoryWrites GTI/Color Cache The total number of GTI memory writes from Render Color Cache (Render Color Cache invalidations).
GtiMscMemoryWrites GTI/Color Cache The total number of GTI memory writes from Multisampling Color Cache (Multisampling Color Cache invalidations).
GtiHizMemoryWrites GTI/Depth Cache The total number of GTI memory writes from Hierarchical Depth Cache.
GtiStcMemoryWrites GTI/Depth Cache The total number of GTI memory writes from Stencil Cache.
GtiRczMemoryWrites GTI/Depth Cache The total number of GTI memory writes from Render Depth Cache.
GtiMemoryWrites GTI The total number of GTI memory writes.
GtiL3Bank0Writes GTI/L3 The total number of GTI memory writes from L3 Bank 0 (L3 Bank 0 invalidations).
GtiL3Bank1Writes GTI/L3 The total number of GTI memory writes from L3 Bank 1 (L3 Bank 1 invalidations).
GtiL3Bank2Writes GTI/L3 The total number of GTI memory writes from L3 Bank 2 (L3 Bank 2 invalidations).
GtiL3Bank3Writes GTI/L3 The total number of GTI memory writes from L3 Bank 3 (L3 Bank 3 invalidations).
GtiL3Writes GTI/L3 The total number of GTI memory writes from L3 (L3 invalidations).
EuUntypedReads0 L3/Data Port The subslice 0 EU Untyped Reads subslice 0.
EuTypedReads0 L3/Data Port The subslice 0 EU Typed Reads subslice 0.
EuUntypedWrites0 L3/Data Port The subslice 0 EU Untyped Writes subslice 0.
EuTypedWrites0 L3/Data Port The subslice 0 EU Typed Writes subslice 0.
EuUntypedAtomics0 L3/Data Port The subslice 0 EU Untyped Atomics subslice 0.
EuTypedAtomics0 L3/Data Port The subslice 0 EU Typed Atomics subslice 0.
EuA64UntypedReads0 L3/Data Port The subslice 0 EU A64 Untyped Reads subslice 0.
EuA64UntypedWrites0 L3/Data Port The subslice 0 EU A64 Untyped Writes subslice 0.
TypedReads0 L3/Data Port The subslice 0 typed reads.
TypedWrites0 L3/Data Port The subslice 0 typed writes.
UntypedReads0 L3/Data Port The subslice 0 untyped reads (including SLM reads).
UntypedWrites0 L3/Data Port The subslice 0 untyped writes (including SLM writes).
TypedAtomics0 L3/Data Port The subslice 0 typed atomics.
TypedReadsPerCacheLine L3/Data Port The ratio of EU typed read requests to L3 cache line reads.
TypedWritesPerCacheLine L3/Data Port The ratio of EU typed write requests to L3 cache line writes.
UntypedReadsPerCacheLine L3/Data Port The ratio of EU untyped read requests to L3 cache line reads.
UntypedWritesPerCacheLine L3/Data Port The ratio of EU untyped write requests to L3 cache line writes.
TypedAtomicsPerCacheLine L3/Data Port The ratio of EU typed atomics requests to L3 cache line writes.
EuHybridFpu0Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing hybrid instructions on FPU0.
EuHybridFpu1Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing hybrid instructions on FPU1.
EuTernaryFpu0Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing ternary instructions on FPU0.
EuTernaryFpu1Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing ternary instructions on FPU1.
EuBinaryFpu0Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing binary instructions on FPU0.
EuBinaryFpu1Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing binary instructions on FPU1.
EuMoveFpu0Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing move instructions on FPU0.
EuMoveFpu1Instruction EU Array/Pipes/Instructions The percentage of time in which execution units were actively processing move instructions on FPU1.
SamplerAccesses Sampler The total number of messages send to samplers.
L3Accesses L3 The total number of L3 accesses from all entities.
L3ShaderLookups L3/TAG The total number of L3 cache lookup accesses w/o IC.
L3TotalThroughput L3 The total number of GPU memory bytes transferred via L3.
L3Bank00Accesses L3 The total number of accesses to L3 Bank 00.
L3Bank01Accesses L3 The total number of accesses to L3 Bank 01.
L3Bank02Accesses L3 The total number of accesses to L3 Bank 02.
L3Bank03Accesses L3 The total number of accesses to L3 Bank 03.
L3Bank00IcAccesses L3/IC The total number of accesses to L3 Bank 00 from IC cache.
L3Bank00IcHits L3/IC The total number of hits in L3 Bank 00 from IC cache.
PolyDataReady GPU/3D Pipe/Strip-Fans The percentage of time in which geometry pipeline output is ready
NonSamplerShader00AccessStalledOnL3 GPU/Data Port Percentage of time when HDC has messges to L3, but it's stalled due to lack of credits (s0.ss0)
NonSamplerShader01AccessStalledOnL3 GPU/Data Port Percentage of time when HDC has messges to L3, but it's stalled due to lack of credits (s0.ss1)
NonSamplerShader02AccessStalledOnL3 GPU/Data Port Percentage of time when HDC has messges to L3, but it's stalled due to lack of credits (s0.ss2)
GTRequestQueueFull GTI The percentage of time when SQ is filled above a threshold (usually 48 entries)
L30Bank0Stalled GTI/L3 The percentage of time in which slice0 L3 bank0 is stalled
L30Bank1Stalled GTI/L3 The percentage of time in which slice0 L3 bank1 is stalled
L30Bank1Active GTI/L3 The percentage of time in which slice0 L3 bank1 is active
L30Bank0Active GTI/L3 The percentage of time in which slice0 L3 bank0 is active
L30Bank2Stalled GTI/L3 The percentage of time in which slice0 L3 bank2 is stalled
L30Bank2Active GTI/L3 The percentage of time in which slice0 L3 bank2 is active
L30Bank3Stalled GTI/L3 The percentage of time in which slice0 L3 bank3 is stalled
L30Bank3Active GTI/L3 The percentage of time in which slice0 L3 bank3 is active
PixelData0Ready GPU/Rasterizer/Early Depth Test The percentage of time in which slice0 post-EarlyZ pixel data is ready (after early Z tests have been applied)
Rasterizer0InputAvailable GPU/Rasterizer The percentage of time in which slice0 rasterizer input is available
PSOutput0Available GPU/3D Pipe The percentage of time in which slice0 PS output is available
PixelValues0Ready GPU/3D Pipe The percentage of time in which slice0 pixel values are ready
Rasterizer0OutputReady GPU/Rasterizer The percentage of time in which slice0 rasterizer output is ready
Sampler01InputAvailable GPU/Sampler The percentage of time in which slice0 subslice1 sampler input is available
Sampler02InputAvailable GPU/Sampler The percentage of time in which slice0 subslice2 sampler input is available
Sampler00InputAvailable GPU/Sampler The percentage of time in which slice0 subslice0 sampler input is available
Sampler02OutputReady GPU/Sampler The percentage of time in which slice0 subslice2 sampler output is ready
Sampler00OutputReady GPU/Sampler The percentage of time in which slice0 subslice0 sampler output is ready
Sampler01OutputReady GPU/Sampler The percentage of time in which slice0 subslice1 sampler output is ready
NonPSThread01ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which non-PS thread is ready for dispatch on slice0 subslice1 thread dispatcher
PSThread00ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which PS thread is ready for dispatch on slice0 subslice0 thread dispatcher
NonPSThread00ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which non-PS thread is ready for dispatch on slice0 subslice0 thread dispatcher
PSThread02ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which PS thread is ready for dispatch on slice0 subslice2 thread dispatcher
NonPSThread02ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which non-PS thread is ready for dispatch on slice0 subslice2 thread dispatcher
PSThread01ReadyForDispatch GPU/Thread Dispatcher The percentage of time in which PS thread is ready for dispatch on slice0 subslice1 thread dispatcher
ThreadHeader01ReadyPort0 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice1 thread dispatcher port 0
ThreadHeader00ReadyPort1 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice0 thread dispatcher port 1
ThreadHeader00ReadyPort0 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice0 thread dispatcher port 0
ThreadHeader02ReadyPort1 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice2 thread dispatcher port 1
ThreadHeader02ReadyPort0 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice2 thread dispatcher port 0
ThreadHeader01ReadyPort1 GPU/Thread Dispatcher The percentage of time in which thread header is ready on slice0 subslice1 thread dispatcher port 1
Fpu1ActiveAdjusted EU Array/Pipes The percentage of time in which EU FPU1 pipeline was actively processing including Extended Math processing
RenderBusy GPU The percentage of time when render command streamer was busy.
Vdbox0Busy GPU The percentage of time when Vdbox0 command streamer was busy.
VeboxBusy GPU The percentage of time when vebox command streamer was busy.
BlitterBusy GPU The percentage of time when blitter command streamer was busy.
AnyRingBusy GPU The percentage of time when any command streamer was busy.
StcPMAStall GPU/Stencil Cache Percentage of time when stencil cache line and an overlapping pixel are causing stalls
CsFpu0Active EU Array The percentage of time in which EU FPU0 pipeline was actively processing a compute shader instruction.
CsFpu1Active EU Array The percentage of time in which EU FPU1 pipeline was actively processing a compute shader instruction.