-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few questions about VexiiRiscv L1 caches and Tilelink #31
Comments
Hi, bootMemClear is the intended way to avoid those x-prop. All the test i run VexiiRiscv on, are run by Verilator, which instead of x-prop, use 1 and 0 and randomize all the values at init. I kinda don't feel great about putting bootMemClear by default, because it consume hardware ressources and timings. Maybe instead i should put a big warning message at the very end of the VexiiRiscv hardware generation to warn about it ?
Nice ^^ |
Thanks! Makes perfect sense then.
Can be enough to mention it in documentation, somewhere near "Run a simulation" section. It would save me some time.
Previously used Axi4, now trying Tilelink. // from vexiiriscv/soc/litex/Soc.scala
vexii.lsuL1Bus.setDownConnection(a = withCoherency.mux(StreamPipe.HALF, StreamPipe.FULL), b = StreamPipe.HALF_KEEP, c = StreamPipe.FULL, d = StreamPipe.M2S_KEEP, e = StreamPipe.HALF)
vexii.dBus.setDownConnection(a = StreamPipe.HALF, d = StreamPipe.M2S_KEEP) and understand why it should be written exactly this way. By the way, what is the best place to ask questions? Here? Or maybe in SpinalHDL google group? |
Added :)
Yes, i realy need to provide a better tutorial how to use the tilelink API. I got a bit of funding to do it and should start that very soon.
So up would mean connections toward masters, down mean connections toward slaves. setDownConnection is a way to ask the interconnect to add some pipelining stages between lsuL1Bus and the address decoder which will serve the down connections. a,b,c,d,e refer to the 5 tilelink channels. StreamPipe.HALF is a lightweight pipeline stage which fully cut all combinatorial paths. So all of this is related to timings optimization / synthesis results.
Github issue are good, this repo is good aswell ^^ |
I have a few probably stupid questions.
Do I understand right that
In https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/stream.html it is written that after Also I can not figure out what
Do all the use cases require My current use case is the first one. I have video controller with Axi4ReadOnly interface and I am trying to figure out how to connect it to tilelink and make sure that cpu will writeback cached data when needed. |
I renamed the issue to better match the content :-) |
Hi,
Yes
Yes right for coherent caches. They send "aquire" requests on channel A (single beat), receive "aquire" responses on channel D (multi beat), and "release" dirty/clean cache lines on channel C.
Right
Right
All cases. As soon as you have multiple masters (cpu, dma), and a given CPU has a data cache, then that CPU need lsuL1Coherency. (or it will need to do cache flush/invalidate in software)
One reference for that is the litex SoC.scala. What matter is that the CPU / DMA busses are merged together and then go into either a CacheFiber (l2), either into the HubFiber (no l2). |
Thanks for the answers!
By the way, does VexiiRiscv support any cache flush instructions? I've seen only And how
Ah! That's what I missed! |
Currently, there is nothing to flush particular cache lines. Idealy, CMO would need to be implemented to allow VexiiRiscv usages without memory coherency.
Yes
Yes |
The HubFiber will generate flush / invalidate for every request, while the CacheFiber has a dictionnary which track which cache has what, and so, will only generate usefull flush / invalidate requests |
CMO would be great! I hope it will also allow to avoid useless read-before-write requests to memory - for example when
Interesting. Does it mean that if I use HubFiber it can make sense to connect
Or |
Ahhh that is because now, the cache is implemented as "write-back" Write-through cache don't have that issue, but instead, they spam the memory system with little write requests.
Hmmm, if you have multiple CPU, you would then need to other CPU to manuly flush themself if they modify code sections. fence.i always flush the whole D$ + I$ for both cases. |
I mean that there can be a special instruction that makes L1 cache think that the cache line is already loaded without an a actual read. I've looked through CMO spec and see that it doesn't have it. But can be a custom instruction. char *src = ..., *dst = ...;
for (int i = 0; i < size / 64; ++i) {
fakeReadCacheLine(dst); // custom instruction
for (int j = 0; j < 64; j += 4) *(int*)(dst + j) = *(int*)(src + j);
src += 64, dst += 64;
} |
Ahhhhh lol, i didn't know about it. |
I am stuck with it. Could you please help to figure out why it doesn't work? I want to connect a blackbox to a tilelink bus. It will use only 256-byte read requests. class VideoController extends BlackBox {
val io = new Bundle {
...
val tl_bus = master(tilelink.Bus(tilelink.BusParameter(
addressWidth = 32,
dataWidth = 64,
sizeBytes = 256,
sourceWidth = 0,
sinkWidth = 0,
withBCE = false,
withDataA = false,
withDataB = false,
withDataC = false,
withDataD = true,
node = null
)))
}
noIoPrefix()
mapClockDomain(clock=io.clk, reset=io.reset)
} val coherent_bus = tilelink.fabric.Node().forceDataWidth(64)
val mem_bus = tilelink.fabric.Node().forceDataWidth(64)
val tilelink_hub = new tilelink.coherent.HubFiber()
tilelink_hub.up << coherent_bus
mem_bus << tilelink_hub.down
val video_ctrl = new VideoController()
val video_bus = tilelink.fabric.Node.down()
val video_bus_fiber = fiber.Fiber build new Area {
video_bus.m2s forceParameters tilelink.M2sParameters(
addressWidth = 32,
dataWidth = 64,
masters = List(
tilelink.M2sAgent(
name = video_ctrl,
mapping = List(
tilelink.M2sSource(
id = SizeMapping(0, 1),
emits = tilelink.M2sTransfers(get = tilelink.SizeRange(256, 256))
)
)
)
)
)
video_bus.s2m.supported load tilelink.S2mSupport.none()
video_bus.bus << video_ctrl.io.tl_bus
}
coherent_bus << video_bus // this line causes error The error (the first one, there are many of them) is
I used https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/Bus/tilelink/tilelink_fabric.html#example-cpufiber as an example. |
Ahhhh i think the reason is that the memory coherency hub / l2 can't handle memory requests bigger than one cache line. So no memory request more than 64 bytes is allowed. So either :
|
Yes, changed to 64 and now it successfully generates verilog! Thanks! By the way, is it possible to generate a more explanatory error message in such situation? No idea how would I debug it without your help. |
I will take a look tomorrow ^^ The situation right now is that Hub filter memory accesses to only allow the compatible one, which create a video DMA with no memory access at all, => no read / no write => withDataD = false |
The only reason to use big requests was that in this case memory throughput is a bit higher. But it is not that significant. |
One thing, i recently updated the documentation, in particular around the MicroSoc, you may find interresting things there, in particular : I recently added a tilelink video DMA, see : Good luck :D |
Thanks! I used both MicroSoc and litex SoC as examples. Things like |
Yes i agree. Error reporting isn't great when it goes into Fiber / Tilelink stuff. For the Raw SpinalHDL things using component and so on, it should be good :D |
Actually |
Ahhh right, i just checked the code again, when coherency is enabled, the LsuPlugin will instead just wait that the storebuffer are drained out. val coherentFlusher = l1.coherency.get generate new Area{
invalidationPorts.foreach(_.cmd.ready := withStoreBuffer.mux(storeBuffer.empty, True))
} |
Hi! Sometimes a have deadlocks somewhere in my SoC. I suppose they are caused by attempts to read or write invalid memory regions that are not connected to any slave interfaces. The problem is that I don't have a way to debug it: in my setup there is no integration with standard tools and no easy way for things like logic analyzer over jtag. And scenarios are too complicated for simulation (e.g. boot linux and run gcc on device). Is there a way to add to tilelink hub a default slave interface that will be used if the address is not mapped to anything else? I want to add a module that catches all invalid requests and sends debug information via UART. |
Hi, Yes it is by design. The memory masters have ways to figure out at hardware elaboration the exact memory mapping. So, what is done for DMA which doesn't integrate those check, is to directly add on their DMA bus a spinal.lib.bus.tilelink.fabric.TransferFilter
Hmm that TransferFilter behave like a default slave. The difference is that it intercept the trafic, like a bridge (instead of behaving like a regular peripheral) |
It seems that this time I faced an actual bug in VexiiRiscv.
Unfortunately I can't provide a simple reproducing example. The deadlock happens when I run VexiiParams: https://github.com/petrmikheev/endeavour/blob/master/rtl/spinal/src/main/scala/endeavour/VexiiCore.scala#L40 |
Ahhh, How does it crash freeze, i mean, can you reproduce easily ? is it kinda always dead lock on the same app ? |
Another question is : Are all the timings passing place / route with positive slack ? |
Yes, always dead lock on the same place on every attempt. And it is definitely not just an infinite loop in software - it stopped reacting to timer interrupts.
Timing analyzer reports that it should work fine even at 85C temperature. |
In particular, which app is freezing ? |
Can you send me your linux / buildroot binaries ? |
Console
I don't use buildroot... And an attempt to reproduce my setup in all details would require soldering. The goal of my project was to go the whole path on my own: design fpga board, solder it, design SoC capable to run linux, write linux drivers for my custom peripherals, implement sbi and bootloader, manually bring together linux kernel and all the software (to better understand how linux works), and finally to be able to write and compile code directly on the device. And (if But reproducing the bug on your side can be a big problem. My only hope is that if you use same VexiiRiscv params, run 32-bit debian, and try to compile hello world with gcc, the freeze will happen too. |
I tried some configs running 64 bits debian :
So, mainly, dual issue, with late-alu, and a few other things. |
One thing i can see on your side, is that you only have 1 way of 4Kb for each i$ d$ |
Also, which version of linux do you use ? |
Trying to migrate my hobby project SoC from VexRiscv to VexiiRiscv. I suspect there is a bug in FetchL1Plugic:
After fetching first few commands FetchL1Plugin produces unknown (
'xxxx
) output if cache memory was not zero-initialized.Simulation in iverilog:
After a few clock cycles
'xxxx
propagates to registers in PcPlugin and simulation gets stuck.But if I either set
bootMemClear = true
or disable cache (fetchL1Enable = false
) in ParamSimple, it works fine.The text was updated successfully, but these errors were encountered: