Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rtl] Flush pipe on all CSR modifications #2214

Merged
merged 2 commits into from
Feb 17, 2025
Merged

Conversation

GregAC
Copy link
Contributor

@GregAC GregAC commented Sep 17, 2024

This fixes #2193, an issue that meant bit clears in PMP related CSRs didn't immediately apply to an instruction already in the fetch stage due to a lack of a pipeline flush.

With this change the pipeline will flush in that scenario, fixing the issue. It now flushes the pipeline on all CSR modifications as this makes the pipeline more resliant against similar issues in the future (where the list of CSRs to flush on should have been updated but wasn't).

@GregAC
Copy link
Contributor Author

GregAC commented Sep 17, 2024

Still need to run a full regression on this as well a specific directed test to ensure the particular bug has indeed been solved.

Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the riscv_multiple_interrupt_test is failing on this commit.

// Flush pipe on any CSR modification. Some CSR modifications alter how instructions execute (e.g.
// the PMP CSRs) so this ensures all instructions always see the latest architectural state when
// entering the fetch stage. This causes some needless flushes but performance impact is
// limited. We have a single fetch stage to flush not many stages of a deep pipeline and CSR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar question: Do you mean something like "We only have a single fetch stage and the pipeline is not deep. Also, CSR instructions are normally rare and aren't part of performance critical parts of the code."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I do, I shall rewrite it to make it clearer

@GregAC
Copy link
Contributor Author

GregAC commented Sep 17, 2024

Looks like the riscv_multiple_interrupt_test is failing on this commit.

I think that's a known failure, though may be we're getting more of them now as it's to do with assertions around pipeline flushes and we're doing more pipeline flushes. Could be we should use a less flaky test in our smoke suite.

// the PMP CSRs) so this ensures all instructions always see the latest architectural state when
// entering the fetch stage. This causes some needless flushes but performance impact is
// limited. We have a single fetch stage to flush not many stages of a deep pipeline and CSR
// instructions are in general rare and not part of performance critical parts of the code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it's a common pattern to do a csrrw sp, mscratch, sp to switch to a dedicated stack for exception handling.

For exception handling it also performs quite a few other CSR reads, so I wouldn't call CSR instructions rare and not part of performance critical code.

Copy link
Contributor Author

@GregAC GregAC Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points.

Though do note we don't flush on read only access. Whilst strictly all CSR instructions do perform modifications CSRRS and CSRRC with an x0 rs1 (the register that specifies which bits to set/clear) and the immediate versions with a 0 immediate do not alter the CSRs and will have csr_op_o in this RTL set to CSR_OP_READ so won't trigger the flush.

Do we expect any CSR writes other than to mscratch during normal exception handling code?

I'm wonder if we should operate a white-list system here where modifications to specific CSRs do not trigger a flush but everything else does. mscratch is a good candidate for that list, anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mepc would also be updated in case the exception needs to skip over the current instruction. mstatus may be updated but that actually require flushes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect mepc updates would be less common, not something you'd do in an interrupt handling routine for instance, maybe used in error handling cases but it's reasonable to expect those to be rare. Also used where you're doing software emulation of an instruction where speed would be desired but uncertain how much of that would practically be done on Ibex (the typical use case would be unaligned memory access which we already have hardware handling for).

Still putting mepc in the whitelist seems pretty safe, it's basically a special jump target that gets read out in the ID/EX stage and we'd probably need some radical change to make it have relevance from IF (and hence require flushing on modification).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised use of ecall will require mepc modification (otherwise you get an infinite loop of ecall). Performance degradation from a pipe flush is modest (a single cycle in cases where you hit in icache or have single cycle access to instruction memory I believe) and there'd only be one mepc write per ecall so I don't think it'd be a huge issue but as above it seems a safe one to whitelist so we can do that.

Copy link
Contributor

@nbdd0121 nbdd0121 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For exception handling I think it's not uncommon to save mepc & all registers to memory, call some routine, and unconditionally restore mepc. Also, for things like ecall, it's also needed to unconditionally increment by 4. But you're right they're not touched for interrupt handling, which matters more for than synchronous exceptions.

@marnovandermaas
Copy link
Contributor

Last force push is a rebase.

@GregAC
Copy link
Contributor Author

GregAC commented Feb 14, 2025

The prior CI failure here was a bad assertion. I've recommend we just remove that assertion and related ones as they've been a lot of trouble to get working and provide little utility (#2255).

The latest version of this PR introduces the whitelist described above where we don't flush on certain CSRs being written. It also adds a new commit that directly reads the CSR address from the relevant instruction bits rather than via the operand mux. This helps timing in some cases and was useful to use in ibex_id_stage to implement the whitelist system (rather than reading directly from the raw instruction as was done previously).

@marnovandermaas @vogelpi PTAL

@GregAC
Copy link
Contributor Author

GregAC commented Feb 14, 2025

I've run a full regression with limited iterations (10 max for each test) which didn't show any issues, doing a full regression now.

@Razer6
Copy link
Member

Razer6 commented Feb 14, 2025

I just wanted to comment on this PR to see its state. Glad there is some progress.

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GregAC , this change makes sense to me. I like the idea of moving to a whitelist system for when to avoid flushes. If it turns out there are performance issues for certain CSRs, we can take a close look an reason specifically for those. Identifying and fixing issues with CSR modifications that erroneously don't lead to flushes is more painful.

Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg, this looks good to me.

Previously the ibex_cs_registers module received the CSR address via the
operand muxes. This has been observed to cause timing issues in some
cases. The CSR address is always read from the same bits of the
instruction so there's no need to go via the operand muxes. With this
change the relevant instruction bits are fed straight out of the decoder
and into the ibex_cs_registers module.
This fixes lowRISC#2193, an issue that meant bit clears in PMP related CSRs
didn't immediately apply to an instruction already in the fetch stage
due to a lack of a pipeline flush.

With this change the pipeline will flush in that scenario, fixing the
issue. It now flushes the pipeline on all CSR modifications as this
makes the pipeline more resliant against similar issues in the future
(where the list of CSRs to flush on should have been updated but
wasn't).
@GregAC
Copy link
Contributor Author

GregAC commented Feb 17, 2025

Latest version just fixes a small lint issue (explicit cast to csr_num_e) which isn't an issue for our CI verilator version but does appear under verilator v5.

I've manually confirmed this fixes the specific PMP issue by observing waves of a suitable test program

@marnovandermaas the formal flow should be run on this as well to double check all is good. Can do that once this is merged into master.

@marnovandermaas
Copy link
Contributor

Sounds good, the formal flow has a counter-example which I still need to resolve even before this PR goes in. That work is captured in this PR: #2245

@GregAC GregAC added this pull request to the merge queue Feb 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 17, 2025
@GregAC GregAC added this pull request to the merge queue Feb 17, 2025
Merged via the queue into lowRISC:master with commit 0f27580 Feb 17, 2025
11 checks passed
@GregAC GregAC deleted the csr_flush_fix branch February 17, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline not being flushed on PMP register clears
6 participants