Add MI300 details to docs #446

peterjunpark · 2024-10-09T17:34:21Z

demo build: https://advanced-micro-devices-demo--446.com.readthedocs.build/projects/omniperf/en/446/

Performance model

Update Performance model to mention support for MI300
- Same in MFMA section
- May also want to adopt something like "MI2XX" for MI300A/300X here

Pipeline descriptions

VALU

Add MI300 to list of products with MFMA units here
Update note at bottom of section to include MI300 in list of accelerators with 8 waveslots / SIMD here

AGPRs

Add MI300 to MI200 list here

Pipeline metrics

Need to add new MFMA instruction metrics for MI300 here
- And FLOPs for the same here

L1

Need to update L1 cache-line size here to 128B for MI300+: here

UTCL1

MI300 fixes the bug where hit-on-miss isn't counted: update here

TA instruction counts

On MI300, we now theoretically use the scratch* instructions for stack/spill access, which ... invalidates a lot of this section. We need to figure out how to rework this

Scalar / Instruction cache

Need to update size and how many CUs it's shared between here
- 64KB / shared between CUs on MI300

L2

L2 is no longer coherence point for MI300+
- L2<->EA request flow diagram needs to be updated for MI300
  - Essentially, we need to add a 128B read request line and figure out how to represent this on the diagram
Update channel count in text for MI300 here
- 16 channels per XCC, still 256B interleaved
Update Streaming requests text to also include 300
Update probe requests text for MI300
- Likely more involved, need to write some tests to see what triggers these here
Update note at bottom of section to include MI300 here
- [ ] 128B cache-line there as well
L2-Fabric Write and Atomic Bandwidth
- All atomics are now counted as such on MI300, because they are not cached in L2 and must go to MALL
- Same with:
- HBM Write and Atomic Traffic
- Remote Write and Atomic Traffic
- Atomic Traffic
- Uncached Write and Atomic Traffic
Detailed transaction metrics: here
- Need to add 128B read request metric to table

Memory type

Need to update table for MI300, may need a better way to represent this as fine-grained/coarse-grained isn't super relevant there anymore.

New concepts

Need to discuss XCC / NPS / partitioning modes somewhere. There's no super logical place to do so, but we might do this in the definitions or as s seperate part of the performance model.
The key points for Omniperf are that:
- [ ] Number of CUs depends on # of XCCs active in the current partitioning mode
- [ ] Number of HBM channels per partition (and thus: the achievable L2<->EA bandwidth) depends on the NPS mode
Need to discuss MALL as coherence point somewhere
Neither of the above need to be in significant detail, IMO
Neither of these have specific metrics tied to them, but are important to understand how we're presenting data

References

Should add MI300 / CDNA3 ISA Guide

start adding MI300 content Signed-off-by: Peter Park <[email protected]>

Signed-off-by: Peter Park <[email protected]>

peterjunpark added the documentation Improvements or additions to documentation label Oct 9, 2024

peterjunpark added 3 commits October 10, 2024 13:38

Change MI2XX to MI200. Add MI300 note

4be4c89

start adding MI300 content Signed-off-by: Peter Park <[email protected]>

Fix request flow image sizes

d66b595

Signed-off-by: Peter Park <[email protected]>

bump rocm-docs-core to 1.8.2

de0b4ca

Signed-off-by: Peter Park <[email protected]>

peterjunpark force-pushed the docs/mi300 branch from 2763296 to de0b4ca Compare October 10, 2024 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MI300 details to docs #446

Add MI300 details to docs #446

peterjunpark commented Oct 9, 2024 •

edited

Loading

Add MI300 details to docs #446

Are you sure you want to change the base?

Add MI300 details to docs #446

Conversation

peterjunpark commented Oct 9, 2024 • edited Loading

Performance model

Pipeline descriptions

VALU

AGPRs

Pipeline metrics

L1

UTCL1

TA instruction counts

Scalar / Instruction cache

L2

Memory type

New concepts

References

peterjunpark commented Oct 9, 2024 •

edited

Loading