-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stm32h7-rng: rework driver, add seed error recovery #1945
Conversation
640dc4f
to
686a45b
Compare
drv/stm32h7-rng/src/main.rs
Outdated
fn attempt_recovery(&mut self) -> Result<(), RngError> { | ||
// The recovery procedure is in section 34.3.7 of the manual. | ||
// | ||
// First, we clear the SEIS flag. The CEIS and SEIS flags are both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is done twice (both before returning SeedError
and here). I'd vote to only clear it here; the explanation of "Clear it so we don't repeat ourselves" doesn't make sense when we're about to return a SeedError
.
// We have no way to recover from the other cases. | ||
return Err(e.into()); | ||
} | ||
} | ||
} | ||
|
||
Ok(cnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little silly to return Ok(cnt)
when it's structurally guaranteed be the same as len
, but dunno if it's worth changing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, the API is weird. I chose not to alter the API in this commit.
return Err(RngError::SeedError); | ||
} | ||
// If data is ready, we can yield it. | ||
if sr.drdy().bit_is_set() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the RNG is ready, we can theoretically read the DR
up to 4x. I'm not sure if we want to, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually clear on whether we can, if we (say) read 1 from it in a previous call. So I figured I'd just read one every time. It should leave DRDY set as I read repeatedly.
ClockError, | ||
SeedError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we count these?
// in this register are write-one-to-preserve, so weirdly we | ||
// want to write SEIS to 1 here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wacky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of registers on ST peripherals like this, for better or worse.
return Err(RngError::SeedError); | ||
} | ||
// If data is ready, we can yield it. | ||
if sr.drdy().bit_is_set() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm choosing to pronounce this bit name as "dirdy"
// It may be worth attempting recovery. Recovery may not | ||
// succeed, in which case this'll just bomb right out of | ||
// here and send an error to the client. | ||
self.rng.attempt_recovery()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it's worth including a ringbuf entry for successful recoveries, so that we can distinguish between two subsequent seed errors in service of the same request, which result in a failure observed by the client, and seed errors that we successfully recovered from and kept going?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it stands currently, i think the ringbuf will basically just say
SeedError, SeedError, SeedError, SeedError
which doesn't make it very obvious whether we've encountered seed errors and recovered from them, or whether the RNG is just totally wedged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, I think. Noting resumption of service is good. I specifically did not include a "successful transaction" ringbuf entry because that tends to cause errors to get overwritten (looking at you, i2c_driver) -- but noting a successful recovery seems wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree that we probably don't want to spam the ringbuf with every successful request!
Addressed the comments, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, CI seems sad for some reason: https://github.com/oxidecomputer/hubris/actions/runs/12323310764/job/34398699576?pr=1945
Once that's resolved, this looks good to me!
for _ in 0..12 { | ||
let _ = self.regs.dr.read(); | ||
} | ||
|
||
// Finally, we check whether SEIS is clear. | ||
if self.regs.sr.read().seis().bit_is_clear() { | ||
// Yay! | ||
ringbuf_entry!(Trace::Recovered); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
|
||
#[derive(Copy, Clone, Debug, Eq, PartialEq, Count)] | ||
enum Trace { | ||
Blank, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a counted_ringbuf
now, the Blank
variant ought to have
Blank, | |
#[count(skip)] | |
Blank, |
We have a machine in manufacturing that gets periodic seed errors. The manual specifies a recovery procedure for seed errors... which we haven't implemented. This commit makes us more tolerant of seed errors by attempting recovery, but not twice in a row. Two seed errors in a row will still return an error. This also adds a ringbuf so we can monitor frequency. The ringbuf increases RAM usage enough that it bumps past the 512-byte boundary, so I removed all the lines that limit it to 512 (life is too short).
4ca2577
to
6bd5475
Compare
We have a machine in manufacturing that gets periodic seed errors. The manual specifies a recovery procedure for seed errors... which we haven't implemented.
This commit makes us more tolerant of seed errors by attempting recovery, but not twice in a row. Two seed errors in a row will still return an error.
This also adds a ringbuf so we can monitor frequency.