Replies: 4 comments 2 replies
-
I really like the idea to shut down the CPU during sleep and I would like to add something like this to the default processor setup.
Where did you put the clock switch / what parts of the processor do you "cut off" during sleep? I suppose there is a clock gate basically right before I would add something like this to the CPU to implement the clock gating: -- CPU Clock Gating -----------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
clock_gating_enable:
if CLOCK_GATING_EN generate -- provide an option to remove the entire clock gating feature
clock_switch: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
clk_en <= '1';
elsif falling_edge(clk_i) then -- update on falling edge to avoid glitches on 'clk'
clk_en <= not ctrl.cpu_sleep; -- disable the main clock when in sleep mode
end if;
end process clock_switch;
-- this is the main clock gate; for FPGAs better instantiate a dedicate global routing switch primitive here --
clk <= clk_i and clk_en;
end generate;
clock_gating_disable:
if not CLOCK_GATING_EN generate
clk <= clk_i;
end generate;
-- always-on clock --
clk2 <= clk_i;
This is what I prefer. The second option would require to rework the entire interrupt system including the software framework (i.e. the acknowledgement mechanism). I think the always-on clock would only be required by the CPU's trap buffer logic: neorv32/rtl/core/neorv32_cpu_control.vhd Line 1388 in 5281589 Obviously, the CPU SLEEP signal would need some modifications (e.g. keep the on-chip debugger operational during sleep), but I think that would not be too hard. If we can agree on option 1 I will do some experiments over here and maybe propose a PR. |
Beta Was this translation helpful? Give feedback.
-
One more thing... Where should we add the clock gate? What would be to "cleaner" / more flexible approach? Option 1Right inside the CPU ( Option 2Outside of the CPU (somewhere in This option would also allow to switch-off / power-down other near-CPU modules (CPU bus switch, caches, maybe even the entire interconnect?!). |
Beta Was this translation helpful? Give feedback.
-
You will learn to love the dark side :P In general what I have seen is option 2 from.. other commercial cores. This is it helps manage the plurality of design methodologies that are used globally. The trend this last decade, or so, has been to centralize clocks and resets for a given device (FPGA/ASIC) in a centralized top level module. Which means for using the core stand-alone it would be helpful to not have the clock gating circuit inside (option 1). Sorry for my somewhat lacking response. There is only 1 signal for the clock gate circuit which is just a combination of the signals provided. I see the point with the debug request. I missed it was only 1 cycle wide. Maybe its the way I added the haltonreset debug function which holds the debug request for a number of cycles to ensure the core gets through its reset sequence before doing a break. I'm interested to see what you end up with. Will likely replace my somewhat janky implementation with yours :) |
Beta Was this translation helpful? Give feedback.
-
Here is the first proposal: #775 All tests are passing and even the on-chip debugger keeps working 🎉 |
Beta Was this translation helpful? Give feedback.
-
Internally we have added a clock gating feature to the neorv32 core using sleep_o to turn off the clock and an new wake_o to turn on the clock.
A detail here is that this would only clock gate the core, as we need the peripherals like WDT, MTIME etc. to keep running while the core is asleep to avoid a deadlock style situation.
Right now wake_o is a simple combination of incoming IRQs and the MIE vector. Which isn't 100% RISCV WFI spec compliant but close.
Note: the global MIE cannot be used as WFI must wake up the core even when the global MIE is set, but not take trap.
To get to full compliance we would need to generate wake_o from a MIP && MIE as that would cause WFI to turn into a NOP of there are any pending interrupts. The good news is that signal kinda exist already in the core, the bad news is this would require the MIP register to be clocked when the core is off.
The solution we have would not require any flops in the core be clocked as it is generated combinatorialy as: wake_o <= (FIRQ_i[31:0]) || MIP[31:0]) && MIE[31:0]
The challenge here, as pointed out by @stnolting , is that the neorv32 spec requires any IRQ to be just 1 RISCV core clock pulse wide. While nice, this requirement would obviously break the above as the MIP will not capture a 1 cycle wide IRA.
In our setup though we have a different principle in that we hold the IRQ for longer than 1 clock, sometimes a lot longer if its a write to clear pending module. We can always argue which is normal or not. In our case we have multiple clock domains, where the core is at the fastest domain (fairly normal) and peripherals, busses etc, are at various reduced clock speeds to help lower power. For these types of setups required a single riscv clock pulse for IRQ would require a lot of retimer flops. So we just feed signals in straight with the caveat that the firmware on the core needs to either 1) clear the pending IRQ from the source before clearing MIP, or 2) wait until the end of the ISR to clear MIP.
The few outs I see are:
We opted for 2 as most systems in which the core is used its likely that the core itself would be running >2x faster than any peripheral. Combined with "enhancements" to at speed peripherals to generate longer than 1 clock cycle pulses.
Anywho, thoughts?
Beta Was this translation helpful? Give feedback.
All reactions