Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a software rollback protection policy in RoT update_server. #1809

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lzrd
Copy link
Contributor

@lzrd lzrd commented Jun 10, 2024

Issue #1570 software anti-rollback

Implement a software rollback protection policy in RoT update_server using:

  • the epoch value in the optional ImageHeader for early rejection (1st block presented before any flash erase/write) or
  • the EPOC tag in optional caboose for rejecting activation of an image.

"optional" because there are existent Bootleby images without ImageHeader or Caboose and Hubris images images in the future that may eliminate ImageHeader. In cases where an epoch value is not present, the image epoch is treated as equal to zero.

Related PRs:
https://github.com/oxidecomputer/sprot-release/pull/11
oxidecomputer/hubtools#32
https://github.com/oxidecomputer/permission-slip/pull/205
oxidecomputer/management-gateway-service#240

@lzrd lzrd force-pushed the epoch branch 5 times, most recently from 6ae7cb6 to 8e820c1 Compare August 16, 2024 17:30
@lzrd lzrd marked this pull request as ready for review August 20, 2024 20:02
@lzrd lzrd requested review from cbiffle and flihp as code owners August 20, 2024 20:02
@lzrd lzrd force-pushed the epoch branch 3 times, most recently from 9030646 to cb0b519 Compare September 17, 2024 22:10
@lzrd lzrd force-pushed the epoch branch 4 times, most recently from db63fff to 4b27b65 Compare October 10, 2024 22:46
lzrd added 5 commits October 14, 2024 14:58
Remove last of the saturated math calls.
Rename FlashRange::{store,exec} to {stored,at_runtime} for clarity.
Remove unnecessary image validation logic in fn padded_image_len.
Don't require updating to hubtools 0.4.7. There doesn't need to be
an 'EPOC' in the caboose as it is always "0" at this step in
the release engineering flow.

Fix complementary swapped args and corresponding swapped comparison to
read correctly.

Remove vestigial epoch header check. With the decision to not use
ImageHeader.epoch, a TBD prep_image_update variant will contain
the epoch information.
@@ -416,13 +416,14 @@ impl idl::InOrderUpdateImpl for ServerImpl<'_> {
Ok(BLOCK_SIZE_BYTES)
}

/// Deprecated. The version and epoch values are in the Caboose.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this entirely? I don't see anyone calling it anymore.

@@ -461,7 +461,7 @@ SpRot.status() => Status {
}
```

Update API, retrieve current version.
Update API, retrieve current version (deprecated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be able to remove SpRot.current_version entirely; I don't see anyone calling it anymore.

@@ -49,7 +49,7 @@ task-slots = ["sys", "user_leds"]

[tasks.net]
name = "task-net"
stacksize = 3000
stacksize = 8000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is an unrelated change which you discovered when building every app-*.toml?

@@ -1567,7 +1567,7 @@ fn build_kernel(
/// Returns true if the header was found and updated,
/// false otherwise.
fn update_image_header(
cfg: &PackageConfig,
_cfg: &PackageConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just remove this argument, since it's now unused?

Comment on lines +247 to +251
if *self == SlotId::A {
SlotId::B
} else {
SlotId::A
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *self == SlotId::A {
SlotId::B
} else {
SlotId::A
}
match self {
SlotId::A => SlotId::B,
SlotId::B => SlotId::A,
}

slightly shorter, and will break loudly if someone adds a third slot

// An image may have an ImageHeader located after the
// LPC55's mixed header/vector table.
pub const IMAGE_HEADER_OFFSET: u32 = 0x130;
pub const CABOOSE_TAG_EPOC: [u8; 4] = [b'E', b'P', b'O', b'C'];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be written as

Suggested change
pub const CABOOSE_TAG_EPOC: [u8; 4] = [b'E', b'P', b'O', b'C'];
pub const CABOOSE_TAG_EPOC: [u8; 4] = *b"EPOC";


/// Does (component, slot) refer to the currently running Hubris image?
pub fn is_current_hubris_image(component: RotComponent, slot: SlotId) -> bool {
// Safety: extern statics aren't controlled by Rust so poking them can
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a Safety comment here? There's no unsafe.

nxp_image_executation_address: u32, // 0x32
// Additional trailing vectors are not
// interesting here.
// _vector_table_2[u32; 2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out _vector_table_2?

return Ok(len as usize);
fn cached_stage0_image_len(&self) -> Result<usize, UpdateError> {
if let Ok(vectors) = ImageVectorsLpc55::try_from(
self.fw_cache[0..BLOCK_SIZE_BYTES].as_bytes(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing a BLOCK_SIZE_BYTES chunk then passing it into read_from_prefix doesn't make sense to me.

We should

  1. Add a const assertion that fw_cache is long enough to fit core::mem::size_of::<ImageVectorsLpc55>
  2. do something like
let vectors = ImageVectorsLpc55::read_from(
    &self.fw_cache[..size_of::<ImageVectorsLpc55>()]
).unwrap_lite()
  1. Remove ImageVectorsLpc55's TryFrom implementation, since this is the only place it's used

) -> Result<Option<Epoch>, UpdateError> {
if let Ok(span) = caboose_slice(image) {
let mut block = [0u8; BLOCK_SIZE_BYTES];
let caboose = block[0..span.len()].as_bytes_mut();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Can this line panic if the caboose is larger than BLOCK_SIZE_BYTES? It seems like we should instead have a custom type implementing TlvcRead that uses image to read caboose bytes.

}

pub fn new_hybrid<'a>(
flash: &'a drv_lpc55_flash::Flash<'a>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if the order in memory is [buffer, flash], then that should also be the argument order her (and constructor order below)

return Err(UpdateError::InvalidHeaderBlock);
}

// We don't rely on the ImageHeader, but if it is there, it needs to be valid.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't remove the ImageHeader, right?

I'm wondering why the logic change here; previously, we required the ImageHeader if the component is RotComponent::Hubris (but allowed for bootloaders that have been released without an ImageHeader).

let ram_end_offset = ram.len().min(end_offset);
// Transfer starts within the RAM part of this image.
let data = ram
.get((start_offset)..ram_end_offset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.get((start_offset)..ram_end_offset)
.get(start_offset..ram_end_offset)

Comment on lines +546 to +550
let buffer =
&mut [0u8; core::mem::size_of::<ImageVectorsLpc55>()];
self.read_bytes(0u32, buffer.as_bytes_mut())?;
ImageVectorsLpc55::read_from_prefix(&buffer[..])
.ok_or(UpdateError::OutOfBounds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let buffer =
&mut [0u8; core::mem::size_of::<ImageVectorsLpc55>()];
self.read_bytes(0u32, buffer.as_bytes_mut())?;
ImageVectorsLpc55::read_from_prefix(&buffer[..])
.ok_or(UpdateError::OutOfBounds)
let mut v = ImageVectorsLpc55::new_zeroed();
self.read_bytes(0u32, v.as_bytes_mut())?;
v

// - values greater than u32::MAX
//
// Hand coding reduces size by about 950 bytes.
impl From<&[u8]> for Epoch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote to return an Option<Epoch> instead, or Result<Epoch, ()> if you want to use TryFrom. Snapping invalid epoch tags to 0 seems like a recipe for confusion, because it will allow updates to an invalid epoch if the initial epoch is 0.

/// This test is also used when the header block first arrives
/// so that images can be rejected early, i.e. before any flash has
/// been altered.
pub fn check_rollback_policy(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this function is only ever useful when complete is true; otherwise, we can't read the epoch tag from the new image's caboose (so we have to keep loading more of the image). Could we remove the complete flag and only call the function on complete images?

(Some(active_epoch), None) => {
// If next_image is partial and HEADER_BLOCK has no ImageHeader,
// then there is no early rejection, proceed.
if !complete || active_epoch.can_update_to(Epoch::from(0u32)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same vague concerns as before: being able to update to an invalid / missing epoch (but only from epoch 0) seems odd to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants