Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm32h7-rng: rework driver, add seed error recovery #1945

Merged
merged 1 commit into from
Dec 13, 2024
Merged

Conversation

cbiffle
Copy link
Collaborator

@cbiffle cbiffle commented Dec 12, 2024

We have a machine in manufacturing that gets periodic seed errors. The manual specifies a recovery procedure for seed errors... which we haven't implemented.

This commit makes us more tolerant of seed errors by attempting recovery, but not twice in a row. Two seed errors in a row will still return an error.

This also adds a ringbuf so we can monitor frequency.

@cbiffle cbiffle force-pushed the cbiffle/h7rng branch 2 times, most recently from 640dc4f to 686a45b Compare December 12, 2024 23:34
@cbiffle cbiffle requested a review from mkeeter December 13, 2024 02:43
fn attempt_recovery(&mut self) -> Result<(), RngError> {
// The recovery procedure is in section 34.3.7 of the manual.
//
// First, we clear the SEIS flag. The CEIS and SEIS flags are both
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is done twice (both before returning SeedError and here). I'd vote to only clear it here; the explanation of "Clear it so we don't repeat ourselves" doesn't make sense when we're about to return a SeedError.

// We have no way to recover from the other cases.
return Err(e.into());
}
}
}

Ok(cnt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little silly to return Ok(cnt) when it's structurally guaranteed be the same as len, but dunno if it's worth changing here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, the API is weird. I chose not to alter the API in this commit.

return Err(RngError::SeedError);
}
// If data is ready, we can yield it.
if sr.drdy().bit_is_set() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the RNG is ready, we can theoretically read the DR up to 4x. I'm not sure if we want to, though!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually clear on whether we can, if we (say) read 1 from it in a previous call. So I figured I'd just read one every time. It should leave DRDY set as I read repeatedly.

Comment on lines +31 to +35
ClockError,
SeedError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we count these?

Comment on lines +78 to +83
// in this register are write-one-to-preserve, so weirdly we
// want to write SEIS to 1 here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wacky!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of registers on ST peripherals like this, for better or worse.

return Err(RngError::SeedError);
}
// If data is ready, we can yield it.
if sr.drdy().bit_is_set() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm choosing to pronounce this bit name as "dirdy"

drv/stm32h7-rng/src/main.rs Show resolved Hide resolved
drv/stm32h7-rng/src/main.rs Outdated Show resolved Hide resolved
// It may be worth attempting recovery. Recovery may not
// succeed, in which case this'll just bomb right out of
// here and send an error to the client.
self.rng.attempt_recovery()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it's worth including a ringbuf entry for successful recoveries, so that we can distinguish between two subsequent seed errors in service of the same request, which result in a failure observed by the client, and seed errors that we successfully recovered from and kept going?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it stands currently, i think the ringbuf will basically just say

SeedError, SeedError, SeedError, SeedError

which doesn't make it very obvious whether we've encountered seed errors and recovered from them, or whether the RNG is just totally wedged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I think. Noting resumption of service is good. I specifically did not include a "successful transaction" ringbuf entry because that tends to cause errors to get overwritten (looking at you, i2c_driver) -- but noting a successful recovery seems wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree that we probably don't want to spam the ringbuf with every successful request!

@cbiffle
Copy link
Collaborator Author

cbiffle commented Dec 13, 2024

Addressed the comments, PTAL.

@cbiffle cbiffle requested review from mkeeter and hawkw December 13, 2024 21:29
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, CI seems sad for some reason: https://github.com/oxidecomputer/hubris/actions/runs/12323310764/job/34398699576?pr=1945

Once that's resolved, this looks good to me!

for _ in 0..12 {
let _ = self.regs.dr.read();
}

// Finally, we check whether SEIS is clear.
if self.regs.sr.read().seis().bit_is_clear() {
// Yay!
ringbuf_entry!(Trace::Recovered);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay!


#[derive(Copy, Clone, Debug, Eq, PartialEq, Count)]
enum Trace {
Blank,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a counted_ringbuf now, the Blank variant ought to have

Suggested change
Blank,
#[count(skip)]
Blank,

We have a machine in manufacturing that gets periodic seed errors. The
manual specifies a recovery procedure for seed errors... which we
haven't implemented.

This commit makes us more tolerant of seed errors by attempting
recovery, but not twice in a row. Two seed errors in a row will still
return an error.

This also adds a ringbuf so we can monitor frequency. The ringbuf
increases RAM usage enough that it bumps past the 512-byte boundary, so
I removed all the lines that limit it to 512 (life is too short).
@cbiffle cbiffle enabled auto-merge (rebase) December 13, 2024 21:57
@cbiffle cbiffle merged commit 6270d4c into master Dec 13, 2024
125 checks passed
@cbiffle cbiffle deleted the cbiffle/h7rng branch December 13, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants