Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure the SP on SP_RESET signal interrupt #1946

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Measure the SP on SP_RESET signal interrupt #1946

wants to merge 9 commits into from

Conversation

lzrd
Copy link
Contributor

@lzrd lzrd commented Dec 13, 2024

This is a work in progress. Measurement currently takes 12.8s about 0.482 and we'd like to improve that or accommodate that delay before merging.
That's in addition to any feedback that people may have.

On SP_RESET asserted, the RoT will measure the entire active SP Hubris bank (Hubris plus 0xff padding from 0x800_0000 to 0x80f_ffff).
The measurement reliably takes about 0.482 seconds since neither the SP nor the RoT are doing anything else during this time. This low measurement time and the existing timeouts and retries used by the control plane mitigate concerns that were present when the measurement time was measured in multiple seconds.

This time could be further reduced. It is currently taking 0.245 seconds to inject the measurement program into the SP. Each u32 is being sent and read back separately and retried if there is any error (no errors have been seen as of yet).

If all is well with the SP, the measurement will match the Sha3_256 value represented in the SP Hubris archive in the file named "img/final.fwid".

Testing SP measurement

Checkout hubris branch attest-sp

APP=$PWD/app/oxide-rot-1/app-dev.toml
ARCHIVE=$(cargo -q xtask print --archive --image-name a $APP)
cargo xtask flash $APP

Measure on SP Reset:

Hit the reset button on the SP or use the SpCtrl.db_reset_sp function:

humility --archive=$ARCHIVE hiffy -c SpCtrl.db_reset_sp -a delay_ms=10
sleep 13

See traces

The ringbuf trace in swd and attest tasks have the interesting information including the time expended:

humility --archive=$ARCHIVE ringbuf

Dump the attestation log:

humility --archive=$ARCHIVE hiffy -c Attest.log -a offset=0 -n 256 -o out
hexdump -C out

The hexdump is not hard to read directly. The first four bytes are a LE u32 giving
the number of valid entries. Each entry is a 1-byte discriminator that is 0_u8 for Sha3_256 followed by the 32-byte measurement (FWID). There is space for 16 measurements.

Notes

  • The RoT only attempts a measurement if it detects an SP reset.
  • Existing measurements are cleared if an STLINK power-on is detected or if an STLINK is on during a measurement attempt. It is observed that powered-off STLINKs have no effect.
  • No SP reset can be captured until the swd task has initialized. If needed, the control plane can request the SP to reset itself which will result in the RoT measuring the SP.
  • The attest task needs has a reset_log_and_record function usable only by the swd task (and hiffy for debug builds).

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 554 files.

Valid Invalid Ignored Fixed
552 2 0 0
Click to see the invalid file list
  • lib/endoscope/src/main.rs
  • lib/endoscope/src/shared.rs

@lzrd lzrd force-pushed the attest-sp branch 3 times, most recently from bb29475 to 64c50cc Compare February 9, 2025 22:59
@lzrd lzrd force-pushed the attest-sp branch 2 times, most recently from 1dcbf79 to 2f49e0c Compare February 20, 2025 19:26
@lzrd lzrd marked this pull request as ready for review February 21, 2025 18:16
@lzrd lzrd changed the title WIP - Measure the SP on SP_RESET signal interrupt Measure the SP on SP_RESET signal interrupt Feb 21, 2025
Copy link
Collaborator

@labbott labbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice. Thanks for putting in the work to finally make the RoT a reality!

Either before we merge or right afterwards I think we should makes sure this gets run through a sample manufacturing flow to make sure we don't have any surprises there.

Comment on lines 1398 to 1407
// It takes about 0.25 seconds (236 RoT systicks) for `endoscope` to run.
// Allow about twice that time for the measurement to complete.
// endoscope executes a BKPT instruction on completion.
// We observe an S_HALT state if all goes well.
if self.halt_wait(512).is_err() {
ringbuf_entry!(Trace::DidNotHalt);
return Err(());
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we observe if endoscope either loops forever or hits a fault?

self.halt_wait(5000)
}

fn halt_wait(&mut self, timeout: u64) -> Result<(), SpCtrlError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I confused myself because halt_wait means wait for the system to halt, not attempt to halt and then wait. Can you either add a comment to this effect or change the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_sp_halt would be better.

value: u32,
) -> Result<(), SpCtrlError> {
// C1.6 Debug system registers
let r = match register {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vibes: should we make the enum Reg include all of these values? Then, we could make this function take a register: Reg instead of having to check here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. I found it difficult to reason about the switch statement.

.map_err(|_| SpCtrlError::Fault)?;

let cnt = buf.len();
if cnt % 4 != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we do this check before starting a read transaction, to avoid leaving the SP in a weird state?

let gpio = Pins::from(self.gpio);
setup_rot_to_sp_reset_l_in(self.gpio);
gpio.set_val(ROT_TO_SP_RESET_L_IN, Value::One); // should be a no-op
if squelch_notify {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by squelch_notify here; wouldn't the IRQ have already fired in sp_reset_enter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're re-asserting SP_RESET and don't want to be notified about the reset that we caused. This removes the interrupt that we would have gotten on return from the handler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the kernel queue the notification as soon as the pin changes state? I would expect that we need a call to sys_irq_control_clear_pending here as well.

Comment on lines +1704 to +1685
self.init = true; // we don't want GPIO pin reconfiguration.
if self.do_setup_swd().is_ok() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines make me nervous: do_setup_swd itself sets self.init = true on success, but here we unconditionally set it. I feel like init: bool is too ambiguous; we're using it both for "are pins configured" and "are we controlling the SP over SWD". Should it instead be an enum type?

// Setting up to inject the measurement program into the SP
// has several potential failures. Use this `prep` closure
// and `need_undo` state to keep from indenting too much.
let mut prep = |undo: &mut Undo| -> Result<(), ()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can capture need_undo automatically and don't need to pass it as an argument (so *undo in the function body would just become need_undo)

Comment on lines +1747 to +1735
if self
.dp_write_bitflags::<Demcr>(Demcr::VC_CORERESET)
.is_err()
{
ringbuf_entry!(Trace::DemcrWriteError);
*undo |= Undo::VC_CORERESET;
return Err(());
}
*undo |= Undo::VC_CORERESET;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can remove duplicate code by capturing the result, then setting undo, then checking it.

Suggested change
if self
.dp_write_bitflags::<Demcr>(Demcr::VC_CORERESET)
.is_err()
{
ringbuf_entry!(Trace::DemcrWriteError);
*undo |= Undo::VC_CORERESET;
return Err(());
}
*undo |= Undo::VC_CORERESET;
let r = self.dp_write_bitflags::<Demcr>(Demcr::VC_CORERESET);
*undo |= Undo::VC_CORERESET;
if r.is_err() {
ringbuf_entry!(Trace::DemcrWriteError);
return Err(());
}

// Asserting SP_RESET for >1ms here works.
hl::sleep_for(1);

if self.dp_write_bitflags::<Dhcsr>(Dhcsr::resume()).is_err() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resume() is a confusing name for the function which sets debug enable

Comment on lines +1802 to +1786
let digest = if prep(&mut need_undo).is_ok() {
self.do_measure_sp()
} else {
Err(())
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let digest = if prep(&mut need_undo).is_ok() {
self.do_measure_sp()
} else {
Err(())
};
let digest = prep().and_then(|()| self.do_measure_sp());

(this assumes you take my suggestion above of capturing need_undo instead of passing it in)


// This read of DHCSR should never be useful given the code above.
// Read back to make sure that VC_CORERESET is clear.
// XXX remove
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it time to remove this?

}

// The SP is still halted.
// Get it running again by toggling its RESET pin.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to leave a comment here explaining that the caller is responsible for clearing the pending IRQ, so we don't measure it again.

lzrd added 8 commits February 25, 2025 14:13
Add interrupt-related API calls to the LPC55 `gpio_driver`.

A task on an LPC55 can now configue and use GPIO interrupts.

app.toml example shows Pin Interrupt configuration:

[tasks.foo]
...
interrupts = { "pint.irq0" = "button-irq" }
...
task-slots = ["gpio_driver", ...]

[tasks.foo.config]
pins = [
  { name="BUTTON', pin={ port=1, pin=9}, alt=0, pint=0, direction="input", opendrain="normal" }
]
swd:
  - detects JTAG/dongle and SP_RESET
  - can drive SP_RESET to stop SP and initiate measurement procedure.
  - on JTAG dongle present or other failure, don't record a bogus FWID, just reset the log.
  - on SP reset detection, injects a blob to measure the SP.
  - on powered JTAG dongle detection, resets attestation log.
  - swd logs the result to attest task.
  - the hiffy command `db_reset_sp` is a conditional feature for debugging and testing.

endoscope: An injectable code blob covered by the RoT signature.
  - configure STM32H7 clocks in endoscope for best performance (thanks Cliff).
  - Use instruction/data tightly coupled memory (ITCM/DTCM) for better performance.
  - Build the injectable SP measurement blob as a cargo `bindeps` artifact.

attest:
  - Add `reset`, and `reset_and_record` to the attest task.
  - Only tasks listed in `[tasks.attest.config]` are allowed to reset the log.
  - Unauthorized tasks get Err(ClientError::AccessViolation.

build/lpc55pins:
  - Modify build/lpc55pins to generate separate GPIO pin setup functions for named pins.
    This allows easier switching between pin configurations at runtime.
Also add safety command in endoscope.
The swd task doesn't need to perform an SP Hubris image sanity check.
It measures the whole active image bank no matter what the contents.

Remove some unwrap_lite() calls.
  - Remove use of unwrap_lite.
  - Fix write/read-back/retry logic for injecting endoscope.
  - Add comments documenting STM32H753 vector table.
if self
.do_write_core_register(Reg::Dr.into(), sp_reset_vector | 1)
.is_err()
if let Some(sp_reset_vector) = slice_to_le_u32(&ENDOSCOPE_BYTES[4..=7])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? I found the previous version to be more clear:

let sp_reset_vector =
    u32::from_le_bytes(ENDOSCOPE_BYTES[4..=7].try_into().unwrap_lite());

versus the new code, which hides the unreachable! in the else of a conditional which is always taken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants