RISCV compliant wake up signal for clock gating #772

mikaelsky · 2024-01-24T13:47:10Z

mikaelsky
Jan 24, 2024
Collaborator

Internally we have added a clock gating feature to the neorv32 core using sleep_o to turn off the clock and an new wake_o to turn on the clock.
A detail here is that this would only clock gate the core, as we need the peripherals like WDT, MTIME etc. to keep running while the core is asleep to avoid a deadlock style situation.

Right now wake_o is a simple combination of incoming IRQs and the MIE vector. Which isn't 100% RISCV WFI spec compliant but close.
Note: the global MIE cannot be used as WFI must wake up the core even when the global MIE is set, but not take trap.

To get to full compliance we would need to generate wake_o from a MIP && MIE as that would cause WFI to turn into a NOP of there are any pending interrupts. The good news is that signal kinda exist already in the core, the bad news is this would require the MIP register to be clocked when the core is off.

The solution we have would not require any flops in the core be clocked as it is generated combinatorialy as: wake_o <= (FIRQ_i[31:0]) || MIP[31:0]) && MIE[31:0]

The challenge here, as pointed out by @stnolting , is that the neorv32 spec requires any IRQ to be just 1 RISCV core clock pulse wide. While nice, this requirement would obviously break the above as the MIP will not capture a 1 cycle wide IRA.
In our setup though we have a different principle in that we hold the IRQ for longer than 1 clock, sometimes a lot longer if its a write to clear pending module. We can always argue which is normal or not. In our case we have multiple clock domains, where the core is at the fastest domain (fairly normal) and peripherals, busses etc, are at various reduced clock speeds to help lower power. For these types of setups required a single riscv clock pulse for IRQ would require a lot of retimer flops. So we just feed signals in straight with the caveat that the firmware on the core needs to either 1) clear the pending IRQ from the source before clearing MIP, or 2) wait until the end of the ISR to clear MIP.

The few outs I see are:

Pipe down an always_on clock into cpu_control and use that to clock the MIP register, this would resolve the overall problem but increase the system power somewhat.
Require that IRQ pulses are min. of 2 cpu cycles wide to allow the wake_o signal to turn on the clock (1st cycle), followed by the MIP register capturing any new IRQs (2nd cycle).

We opted for 2 as most systems in which the core is used its likely that the core itself would be running >2x faster than any peripheral. Combined with "enhancements" to at speed peripherals to generate longer than 1 clock cycle pulses.

Anywho, thoughts?

stnolting · 2024-01-25T19:06:11Z

stnolting
Jan 25, 2024
Maintainer

I really like the idea to shut down the CPU during sleep and I would like to add something like this to the default processor setup.

Internally we have added a clock gating feature to the neorv32 core using sleep_o to turn off the clock and an new wake_o to turn on the clock.

Where did you put the clock switch / what parts of the processor do you "cut off" during sleep? I suppose there is a clock gate basically right before neorv32_cpu.vhd?

I would add something like this to the CPU to implement the clock gating:

  -- CPU Clock Gating -----------------------------------------------------------------------
  -- -------------------------------------------------------------------------------------------
  clock_gating_enable:
  if CLOCK_GATING_EN generate -- provide an option to remove the entire clock gating feature
    clock_switch: process(rstn_i, clk_i)
    begin
      if (rstn_i = '0') then
        clk_en <= '1';
      elsif falling_edge(clk_i) then -- update on falling edge to avoid glitches on 'clk'
        clk_en <= not ctrl.cpu_sleep; -- disable the main clock when in sleep mode
      end if;
    end process clock_switch;

    -- this is the main clock gate; for FPGAs better instantiate a dedicate global routing switch primitive here --
    clk <= clk_i and clk_en;
  end generate;

  clock_gating_disable:
  if not CLOCK_GATING_EN generate
    clk <= clk_i;
  end generate;

  -- always-on clock --
  clk2 <= clk_i;

Pipe down an always_on clock into cpu_control and use that to clock the MIP register, this would resolve the overall problem but increase the system power somewhat.

This is what I prefer. The second option would require to rework the entire interrupt system including the software framework (i.e. the acknowledgement mechanism).

I think the always-on clock would only be required by the CPU's trap buffer logic:

neorv32/rtl/core/neorv32_cpu_control.vhd

Line 1388 in 5281589

trap_buffer: process(rstn_i, clk_i)

Obviously, the CPU SLEEP signal would need some modifications (e.g. keep the on-chip debugger operational during sleep), but I think that would not be too hard.

If we can agree on option 1 I will do some experiments over here and maybe propose a PR.

2 replies

mikaelsky Jan 25, 2024
Collaborator Author

I get where you are coming from :) hence why I typed up option 1. While still describing where we are right now.

Yeah we clock gate only the clock going into cpu_vhd. This allows everything else, including debug to still run around.

If you are okay with reading verilog this is my current top level gating code. This is independent of option 1/2.

// The content of the MIE vector: (csr.mie_firq,"0000",csr.mie_mei,"000",csr.mie_mti,"000",csr.mie_msi,"000")
  assign RISCVirqMasked  = {fast_irq, 4'b0000, mext_irq_i ,3'b000, mtime_irq, 3'b000, msw_irq_i , 3'b000} & mie;
  assign RISCVsyncEnable = (|RISCVirqMasked) || cpu_debug || !cpu_sleep;
----
  neorv32_cpu_top
  (
    // global control --
    .clk_i        (clk_RISCV_gated_i),
    .rst_n_i      (rst_n_i),
    .sleep_o      (cpu_sleep),
    .debug_o      (cpu_debug),
...

RISCVsyncEnable is the gate signal for the clock gate. Note there is a subtle bug in the mask signal as I should OR it with MIP.
(Verilog note (|RISCVirqMasked) is doing an OR of all bits in the vector into 1 bit. || is a bit wise or, & is a vector and)

My one recommendation is to create a separate clock gating module (e.g. clkGate.vhd) that contain the "soft" clock gate that you describe.
This allows a few tweaks that will be beneficial in the long run. Prime being you can "replace" the clkGate.vhd with a clkGateFPGA.vhd and a clkGateASIC.vhd in the compile list. This would replace the "soft" clock gate with a FPGA/ASIC clock gating instance. This is by far the preferred way in the industry. An alternative is to grab a define from the command line that indicates "FPGA/ASIC" and use that to inject a FPGA/ASIC hard macro.

For the trap_buffer I would recommend splitting the trap_ctrl.irq_pnd our of the trap_buffer process and only keep that as the always on clock to minimize the number of flops running during the sleep state. At least that what I would do.. granted I'm counting nW of power for my usage :)

The wake signal then becomes a reuse of an existing internal signal and would replace the RISCVirqMasked in my line code above.

But yeah sounds good. I'm excited :)

stnolting Jan 25, 2024
Maintainer

I get where you are coming from :) hence why I typed up option 1.

😅

Yeah we clock gate only the clock going into cpu_vhd. This allows everything else, including debug to still run around.

I think the external halt request (send by the debugger to halt the processor and enter debug mode) won't work in your setup. The trigger signal coming from the debug module (DM) is high for just one cycle. If the entire CPU logic is frozen this trigger will get lost during sleep.

neorv32/rtl/core/neorv32_debug_dm.vhd

Line 519 in 5281589

cpu_halt_req_o <= dm_reg.halt_req and dm_reg.dmcontrol_dmactive; -- single-shot

If you are okay with reading verilog [...]

Actually, Verilog is my daily business. I like it, but I love VHDL 😉

RISCVsyncEnable is the gate signal for the clock gate.

Looks good! I like the simplicity of your approach. However, FIRQs (fast_irq) being high for just one cycle is the big drawback here.

My one recommendation is to create a separate clock gating module (e.g. clkGate.vhd) that contain the "soft" clock gate that you describe.

I fully agree. But let's focus on the conceptual functionality for the first version. If that works as expected we can start a PR to move the clock switch into its own design unit.

For the trap_buffer I would recommend splitting the trap_ctrl.irq_pnd our of the trap_buffer process and only keep that as the always on clock to minimize the number of flops running during the sleep state. At least that what I would do.. granted I'm counting nW of power for my usage :)

Good point! But I think we also need to keep trap_ctrl.irq_buf alive (pr at least some bits of it) to buffer debug/halt requests:

neorv32/rtl/core/neorv32_cpu_control.vhd

Lines 1472 to 1479 in 5281589

    
           -- debug-mode entry -- 
        
           if (CPU_EXTENSION_RISCV_Sdext = true) then 
        
             trap_ctrl.irq_buf(irq_db_halt_c) <= debug_ctrl.trig_halt or (trap_ctrl.env_pending and trap_ctrl.irq_buf(irq_db_halt_c)); 
        
             trap_ctrl.irq_buf(irq_db_step_c) <= debug_ctrl.trig_step or (trap_ctrl.env_pending and trap_ctrl.irq_buf(irq_db_step_c)); 
        
           else 
        
             trap_ctrl.irq_buf(irq_db_halt_c) <= '0'; 
        
             trap_ctrl.irq_buf(irq_db_step_c) <= '0'; 
        
           end if;

The wake signal then becomes a reuse of an existing internal signal and would replace the RISCVirqMasked in my line code above.

Do you propose two signals for the clock gating? One signal to turn off the clock and another one to turn it back on?

I think it might be possible just to use the current sleep signal - plus some modifications to that.

But yeah sounds good. I'm excited :)

👍 🚀

stnolting · 2024-01-25T20:52:23Z

stnolting
Jan 25, 2024
Maintainer

One more thing... Where should we add the clock gate? What would be to "cleaner" / more flexible approach?

Option 1

Right inside the CPU (neorv32_cpu.vhd). So clock gating would be an additional CPU "tuning option". The interface of the CPU would not change at all. If this feature is disabled there is no change (without changing the CPU's code base) to add a custom clock gating mechanism for the CPU.

Option 2

Outside of the CPU (somewhere in neorv32_top.vhd). Clock gating would be an additional SoC option. The interface of the CPU would require a second always-on clock input. A custom clock gating mechanism can be implemented without changing the code base of the CPU.

This option would also allow to switch-off / power-down other near-CPU modules (CPU bus switch, caches, maybe even the entire interconnect?!).

0 replies

mikaelsky · 2024-01-25T21:58:40Z

mikaelsky
Jan 25, 2024
Collaborator Author

You will learn to love the dark side :P

In general what I have seen is option 2 from.. other commercial cores. This is it helps manage the plurality of design methodologies that are used globally. The trend this last decade, or so, has been to centralize clocks and resets for a given device (FPGA/ASIC) in a centralized top level module. Which means for using the core stand-alone it would be helpful to not have the clock gating circuit inside (option 1).

Sorry for my somewhat lacking response. There is only 1 signal for the clock gate circuit which is just a combination of the signals provided.

I see the point with the debug request. I missed it was only 1 cycle wide. Maybe its the way I added the haltonreset debug function which holds the debug request for a number of cycles to ensure the core gets through its reset sequence before doing a break.

I'm interested to see what you end up with. Will likely replace my somewhat janky implementation with yours :)

0 replies

stnolting · 2024-01-26T18:45:33Z

stnolting
Jan 26, 2024
Maintainer

Here is the first proposal: #775

All tests are passing and even the on-chip debugger keeps working 🎉

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RISCV compliant wake up signal for clock gating #772

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

RISCV compliant wake up signal for clock gating #772

mikaelsky Jan 24, 2024 Collaborator

Replies: 4 comments · 2 replies

stnolting Jan 25, 2024 Maintainer

mikaelsky Jan 25, 2024 Collaborator Author

stnolting Jan 25, 2024 Maintainer

stnolting Jan 25, 2024 Maintainer

Option 1

Option 2

mikaelsky Jan 25, 2024 Collaborator Author

stnolting Jan 26, 2024 Maintainer

mikaelsky
Jan 24, 2024
Collaborator

Replies: 4 comments 2 replies

stnolting
Jan 25, 2024
Maintainer

mikaelsky Jan 25, 2024
Collaborator Author

stnolting Jan 25, 2024
Maintainer

stnolting
Jan 25, 2024
Maintainer

mikaelsky
Jan 25, 2024
Collaborator Author

stnolting
Jan 26, 2024
Maintainer