-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lib lpc55 rng seed #1820
base: master
Are you sure you want to change the base?
Lib lpc55 rng seed #1820
Conversation
I've marked this as a draft and it will remain a draft till:
The bulk of work here will remain unchanged as I address these last few points so reading over this while still in draft won't be wasted time. |
current draft LGTM at a high level, I'll take another pass after testing |
a527d0e
to
509e53d
Compare
I've updated the two commits that pull data from the |
I understand why you did this. However, this strategy makes me somewhat nervous that we'll accidentally squash an error and leave ourselves vulnerable in production. Is DICE always going to sometimes be disabled on the lpc55s for dev use? Or is this a temporary state of affairs? If the latter, we should open an issue to revert this code when DICE is no longer ever disabled. If the former, we still need to prevent the crash looping. In that case, can we add an extra check where we return an option that matches on any specific error due to DICE being disabled and then confirms that DICE is actually disabled. If DICE is not disabled then we should panic rather than return |
Yes, that's what I meant when I said
I spent some time these last few days trying to come to a decision on this point. My current thinking is that we can't assume that DICE is enabled at all. Most of this is because it can be turned on / off at will which (I think) means we have to make the strongest RNG seed the default and produce errors / panic unless the So my current thinking is that I'll introduce a new feature(s) to the RNG task that controls whether or not missing / unavailable components of the seed are hard errors. This will have to deal with conditional availability of the stage0-handoff region in |
These days it's less likely as everyone is probably working on a sled which has been setup or a rot-carrier which also usually gets embootleby'd . The only people who are likely to be running with DICE off are those of us who are working off of boards that have not been setup or testing something very specific. |
509e53d
to
6542743
Compare
I've added a feature to gate inclusion of these components in the various RNG seeds. If this feature is enabled and the task can't access one of these memory regions then we record the failure in the ringbuf and panic. This isn't what we want, but it at least "fails closed" as they say. I'd prefer this feature were enabled by default so as to require the I plan to take some time over the next day or so to get a rough idea of the level of effort associated with supporting default features for task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from the peanut gallery. Really pleasant code to read <3
6542743
to
9f7600a
Compare
This has been successfully tested on a gimlet, psc & sidecar. To do so required adding the The remaining work to unblock this PR is in #1830 |
e36e985
to
46caef9
Compare
aafbf22
to
5c28b5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More peanuts.
This is intended to enable testing the HRNG on the LPC55 independet of the rng_driver task.
Implementing `BlockRngCore` for `Lpc55Core` got us an implementation of RngCore for free. Unfortunately it cost us the ability to return errors. This wasn't a good tradeoff. Better to implement `RngCore` directly.
Not sure what I was thinking here.
The `ReseedingRng` can handle us pulling more than 4 bytes out at a time.
where: - N > 0 - `HRNG_N(count)` represents count bytes taken from the hardware RNG - `PRNG_N(count)` represents count bytes taken from the Nth generation of the PRNG This commit changes our algorithm for constructing the seed `SEED_N` for the PRNG instance `PRNG_N` from: ``` SEED_N = HRNG(32) ``` to: ``` SEED_N = sha3_256(PRNG_N-1(32) | HRNG(32)) ``` We use `sha3_256` as a mixing function to combine these two components of the seed though the implementation is generic over the digest w/ constraints on the length.
5c28b5f
to
6507771
Compare
Previously we constructed the initial seed as 32 bytes from the hardware RNG: ``` SEED_0 = HRNG(32) ``` This commit includes a seed value constructed by the measured boot implementation from the DICE CDI. This is passed through to the RNG task using the `stage0-handoff` mechanism. This 32 byte value is now extracted and mixed with 32 bytes from the HRNG to construct SEED_0: ``` SEED_0 = sha3_256(DICE_SEED | HRNG(32)) ``` Use of this feature is gated by the `dice-seed` feature.
Platforms assigned a unique serial number can include this string in the initial seed to ensure uniqueness in the bit stream produced by the RNG. We now construct the intial seed as: ``` SEED_0 = sha3_256(DICE_SEED | SN | HRNG(32)) ``` Extracting the Platform Id / serial number from the platform identity cert required exposing the relevant module from the lib-dice crate. We also add additional constants to the template module that are required to know the length of the platform id string at compile time. Finally this feature is gated by the same `dice-seed` feature used for the seed derived by measured boot for simplicity.
6507771
to
6036d0e
Compare
app/lpc55xpresso/app.toml
Outdated
uses = ["rng", "pmc"] | ||
start = true | ||
stacksize = 2200 | ||
stacksize = 4200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this'd make it the highest stack usage in the system, above control-plane-agent. Do we have a sense for what caused this? (Asking because I've been winding up playing stack size whack-a-mole lately.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having read more of the code, I can answer this question at least partially: the Server
type used in the RNG server is large, and it's being created on the stack and passed around.
It's large primarily due to the Sha3
embedded within it, which I recently noticed in an unrelated case was about 2 kiB.
The easiest way to fix this would be to allocate a static Sha3 instance using ClaimOnceCell
(from lib/static-cell
) and then pass it into the server constructor to take ownership of (via a &'static mut
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to use mutable-statics
because Sha3
doesn't have a const constructor (per discussion in chat): b43295c
lib/lpc55-rng/src/lib.rs
Outdated
/// Create a new Lpc55Rng instance after powering on, enabling the clocks | ||
/// and reseting the underlying HRNG. | ||
pub fn new(syscon: &Syscon) -> Self { | ||
let pmc = unsafe { &*PMC::ptr() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest having the PMC passed in (like the Syscon is) to make this dependency more obvious (and lose the unsafe!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the ptr()
function on PMC & RNG definitely qualifies as holding the lpc55-pac crate wrong. Should be addressed by bce3889.
lib/lpc55-rng/src/lib.rs
Outdated
|
||
Lpc55Rng { | ||
pmc, | ||
rng: unsafe { &*RNG::ptr() }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same as on line 25)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in bce3889 as well.
lib/lpc55-rng/src/lib.rs
Outdated
let mut filled = 0; | ||
while filled < dst.len() { | ||
if self.pmc.pdruncfg0.read().pden_rng().bits() { | ||
return Err(RngError::PoweredOff.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this looks like the only way this operation can fail. Given that the RNG is powered on in new
, finding it powered off would be a surprising program bug. Perhaps this should be a panic? Otherwise I expect all callers to wind up unwrapping this at some point... could be wrong though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on whether to panic or return the error here a few times. Given the change requested in #1820 (comment) new
will now take ownership of instances of the PMC & RNG types which makes it clear that we don't intend for them to change behind our backs. Returning the error was part of an attempt to be resilient to this, but it's so far from the common case that it's not worth making things less ergonomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed by: c29dffa
drv/lpc55-rng/src/main.rs
Outdated
let mut mixer = H::default(); | ||
if let Some(seed) = seed { | ||
// mix platform unique seed derived by measured boot | ||
Digest::update(&mut mixer, seed.as_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the explicit qualification on Digest::
-- does mixer.update(seed.as_bytes())
not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type we're using for the digest implements multiple update
functions. One's for the Digest
trait like we're expecting here but the other is for the Update
trait. So I think this is just a disambiguation.
drv/lpc55-rng/src/main.rs
Outdated
) -> Option<T> { | ||
use core::slice; | ||
|
||
// Safety: This memory is setup by code executed before hubris and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This safety comment isn't actually correct, it assumes that region
is valid (but the caller can pass any region it likes -- the fields are public). This function should probably be unsafe in its current form, since it's basically equivalent to ptr::read
.
You could make it safe if the caller couldn't pass any arbitrary region -- maybe hardcode which region, or identify it with an enum, or....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's bad. Thanks for the recommendations. I went w/ hiding the details of the memory regions behind an enum. Seems to do what you suggest: cf2704f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize how badly this change affected stack usage ... working on that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digest::update(&mut mixer, buf.as_ref()); | ||
|
||
// create initial instance of the SeedableRng from the seed | ||
let inner = T::from_seed(mixer.finalize_fixed_reset().into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, you might see lighter stack usage here by using one of the finalize_into
operations, instead of returning the array. It might also not change anything. Hard to tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems to be thwarted by T::from_seed
taking ownership of the seed. If we keep a GenericArray
in the ReseedingRng
to hold the digest we end up having to clone it in try_fill_bytes
. If we keep a [u8; 32]
in the ReseedingRng
the compiler will do the copy for us. This ends up taking up more stack not less. I didn't commit this experiment to this PR branch but put a wip here if you think it could be made to work.
Not sure if having the server take ownership of the PMC & RNG instances is optimal but it seems to most accurately reflect our intent. This resolves: - oxidecomputer#1820 (comment) - oxidecomputer#1820 (comment)
This prevents the caller from attempting to deserialize data from arbitrary memory regions.
Decorating `HandoffDataRegion::load_data` with the `#[inline(always])` attribute mitigates the increase in stack usage from cf2704f
This reduces stack usage by ~74 bytes.
We do this to move the parameterization of the ReseedingRng to the top level / `main` and for simplification. Our stack usage after this commit is unchanged as we explicitly create the inputs to the `ReseedingRng` in a dedicated scope.
This requires restructuring the ReseedingRng constructor a bit but it saves us 400 bytes of stack.
@@ -283,13 +284,22 @@ fn main() -> ! { | |||
) | |||
}; | |||
|
|||
let mixer = mutable_statics! { | |||
static mut MIXER: [Sha3_256; 1] = [Sha3_256::new; _]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change I had in mind -- I'm surprised it didn't reduce stack usage more! Oh well. It's still an improvement.
This PR implements a new strategy to improve the quality of the initial seeding and reseeding of the PRNG hosted in the
rng_driver
task.