Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vivado Benchmarks #90

Open
cknizek opened this issue Jul 22, 2024 · 6 comments
Open

Vivado Benchmarks #90

cknizek opened this issue Jul 22, 2024 · 6 comments

Comments

@cknizek
Copy link

cknizek commented Jul 22, 2024

Multiply Benchmark

multiply.zip

*220 is the total (maximum) number of DSP48E1 slices on the PYNQ Z2, so reaching 220 means that complete DSP48E1 slice utilization has been reached

Multiply Width DSP48E1 Slices Used
8x8 1
8x16 1
16x16 1
16x32 2
32x32 4
32x64 8
64x64 16
64x128 32
128x128 64
128x256 122
256x256 220*
256x512 220*
512x512 220*

MAC Benchmark

mac.zip

MAC Width DSP48E1
16x16 1
16x24 1
24x24 3
32x32 6
64x64 19

BRAM Benchmarks (WIP)
bsg_mem_1r1w_sync_ultrascale.zip
bsg_mem_1rw_sync_ultrascale.zip

@cknizek
Copy link
Author

cknizek commented Jul 29, 2024

Multiplier: 16-bit x 32-bit inputs with 32-bit output
multiplier_16_32_32

@cknizek
Copy link
Author

cknizek commented Jul 29, 2024

ACASReg

Compiled to 0 for both DSP48E1 slices. What this means is that we aren't pipelining the ACOUT cascade datapaths, as the value of 0 implies there are zero pipeline stages in the following datapath [1]:

image


ADReg

Assigned a value of 1 for both slices in the DSP48E1 tile.

This implies there is 1 AD reg for both DSP slices. And so,

image

We can see the placement of the pre-adder for A & D in the following image:

image


ALUModeReg: assigned a value of 0 for both slices. Makes sense, because we don't need pipeline registers as we're not pipelining.


AREG: assigned a value of 0 for both slices. Again, we're not pipelining, so this result makes sense (we don't need any pipeline registers for A.


AUTORESET_PATDET:


Sources
[1] image

[2] imageimage

@cknizek
Copy link
Author

cknizek commented Aug 12, 2024

D$ Benchmark File

data_mem_test.sv.zip

@cknizek
Copy link
Author

cknizek commented Aug 12, 2024

I$ Benchmark (Default Parameters)

Default Parameters

Parameter Value
icache_assoc 8
icache_sets 64
icache_block_width 512
icache_fill_width 128
icache_sindex_width `BSG_SAFE_CLOG2(sets_p)
icache_bank_width block_width / assoc
icache_data_mem_mask_width bank_width >> 3
icache_bindex_width `BSG_SAFE_CLOG2(assoc_p)
icache_data_mem_addr_width (assoc_p > 1) ? (sindex_width_lp+bindex_width_lp) : sindex_width_lp)

Default Primitive Output (Synthesized)

Primitive Amount Description
FDRE 520 Flop & Latch
LUT3 512 LUT
LUT2 16 LUT
RAMB36E1 8 Block Memory

Benchmark Files

imem_benchmark_default.zip

@cknizek
Copy link
Author

cknizek commented Aug 12, 2024

D$ Benchmark (Default Parameters)

Parameter names differ between the below tables and the source .sv file for clarity

Default Parameters

Parameter Value
dcache_assoc 8
dcache_sets 64
dcache_block_width 512
dcache_fill_width 128
dcache_sindex_width `BSG_SAFE_CLOG2(dcache_sets)
dcache_bank_width dcache_block_width / dcache_assoc
dcache_data_mem_mask_width dcache_bank_width >> 3
dcache_bindex_width `BSG_SAFE_CLOG2(dcache_assoc)
dcache_data_mem_addr_width (dcache_assoc > 1) ? (dcache_sindex_width+dcache_bindex_width) : dcache_sindex_width)

Default Primitive Output (Synthesized)

Primitive Amount Description
LUT2 16 LUT
RAMB36E1 8 Block Memory

Benchmark Files

dmem_benchmark_default.zip

@cknizek
Copy link
Author

cknizek commented Aug 12, 2024

32-bit x 32-bit multiplier with 32-bit output --> 3 DSP Output

Using the cascade_dsp = force synthesis flag in the compile.tcl script results in an elimination of the LUT2 and CARRY4 primitives in favor of using 3 DSP48E1 slices

Benchmark Files

multiplier_32x32_32_3DSP.zip

Relevant Documentation

https://docs.amd.com/r/en-US/ug835-vivado-tcl-commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant