-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create backward sentries for function returns and add more checks in cjalr #54
Conversation
Background: A typical (non-malicious) function call is as follows: - caller saves callee-saved registers (tmp registers) in the caller stack - caller saves arguments in arg registers, and overflow arguments in the callee stack - caller stores the return address (i.e. address of the instruction next to the call) in the return-address register (or stack in x86) - caller jumps to the entry point of the callee function (stored in a register). (RISC-V stores return address and jumps to the address in the register via a single jalr instruction) - callee executes the function; it might spill registers in the stack as needed - callee jumps back to the caller using the return-address register (jr) Even if the callee code is bug-free, the caller can violate callee’s CFI in two ways: - Jump to an address different from the entry point of the callee: + For instance, if the callee’s code internally performs operations in a specific branch which the caller did not have access to originally (because of various checks by the callee), the caller can instead directly jump to that branch of the callee (bypassing the checks) and execute that privileged code. (Note that PMP or MMU will not protect against this attack as a caller that can execute a function will have access to execute any part of the function; RISC-V has, for instance, proposed a “landing pad” to protect against this) - Setting a return address different from the correct return address + The caller can make a callee jump to an arbitrary location in some other library that the caller has access to. (This is the basis for Return Oriented Programming (ROP) attack, the returned location called a "gadget") + The return address that is spilled in the stack in overwritten (either with a buffer overflow attack or directly by feeding a wrong return address), and such gadgets chained. How CHERI prevents CFI: - Cross-compartment and Library functions can be jumped only through sealed capabilities (sentries) + jalr instruction to sentry ensures that the offset is 0, so the caller can only jump to published entry point sentries that are accessible by the caller, not arbitrary code inside another library + This prevents jumping to a privileged portion of the callee code bypassing the callee’s checks + This also prevents most of the ROP attacks using gadgets, as gadgets must be published entry points - Return address can only be manipulated by the caller to contain one of the following: + Sentry representing the correct return address + Sentry that the caller has access to already (either a function call entry point that the caller has access to, or the return address sentry of its caller, or some other return address sentry the caller previously had access to) + Non-sealed capability to its own code (Note that in all the 3 cases above, the manipulated return address has no effect, because the caller could have just jumped to that return address in the code) CHERIoT inherits all the above benefits. However, because CHERIoT allows manipulating the status of the interrupt through a function call (and function return), by encoding the interrupt type of the callee in the otype of the calling sentry and the interrupt type of the caller in the otype of the return-to-caller sentry, the following attack can occur: A caller calling an interrupt-disabling callee X can set the return address of the callee to be the same interrupt-disabling sentry to X. This means, the callee, on return will call itself all the while operating with interrupts disabled. This will lead to infinite repeated calls to X with interrupts disabled, violating availability. This attack can be prevented in CHERIoT by adding two new "backwards-edge" sentries and adding more checks on cjalr: otype 4: Backwards-edge, interrupts-disabled sentry otype 5: Backwards-edge, interrupts-enabled sentry. cjalr with a non-cnull link register always creates otype 4 or 5, i.e. backward sentries. (Currently it always creates otype 2 or 3 (forward explicit interrupt control sentries), so this is just a slightly different constant, but in the same place). - cjr $cra (a.k.a. cjalr $cra, $cnull, 0) must take a backwards-edge sentry. Sealing violation exception for other otypes. - cjr with a non-$cra target may be used only with otypes 0 (unsealed) or 1 (inheriting sentry). Sealing violation exception for other otypes. - cjalr with a non-zero return link register will generate a sealing violation for otypes 4 or 5 (backward sentry). - cjalr with a target that is otype 2 or 3 (forward, explicit interrupt control) must use $cra as the link register. Any other link register generates a sealing violation. Note that this disallows tail calls optimization of functions that change interrupt state. (They’re currently disabled and there’s a FIXME in the compiler to enable them. The compiler change is now to remove the FIXME, and then document this for programmers). Note that cjal currently sets a forward edge sentry on the link register. This change makes cjal set an unsealed cap in the link register. This is okay because cjal is not used in cross-compartment calls, and hence the caller and callee are ostensibly in the same security domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please could you also add something to archdoc/chap-changes.tex
and update the 'Sealed Capabilities' section of archdoc/chap-cheri-riscv.tex
?
It would be useful to add the following table to document valid arguments to
Unsealed destinations are not subject to these restrictions. Writing this out has made me realise that at the moment your code permits use of |
Very nice!
Hmm, if |
I don't have very strong opinions here, but I'd lean towards disallowing it. The only use case for cjal[r] with a non-zero and non-link-register cd is in the code-compression things (e.g. spill lots of registers with a function call that uses ct0 as the link register). We don't currently use them, but we potentially could. If we did, we'd want to restrict everything that uses any link register to forward sentries or not sentries. At some point, it would be good to have variants that must use sentries. For CHERIoT, we could use the unused register bit for this, but official RVI extensions cannot. Given that sentries cannot use the offset field, it would be nice to use a 16-bit encoding for this. |
Yes, I just realised that and updated the comment but there is still one case that slipped through: interrupt inheriting sentry with not null, not ra cd. Not sure whether this should be allowed or not. |
Could require that |
I don't believe there are any use cases that require cret to work with an unsealed thing, so I would be happy with that requirement. It prevents substituting an on-stack return address with an unsealed thing. |
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
Hmm, interrupt inheriting sentry should be like any forward sentry (i.e. has to have ra) or it can be null. We can force that. |
👍 |
I want to say this discussion is moot because of an error in Robert's table. It's been fixed to say backward sentry requires $cnull link and $cra target. |
I really like the table that @rmn30 made; I will change the checks to match the table exactly (with appropriate negations in the beginning) and fix the other typos/issues. |
I think it is easier (and safer) to think about the allowed cases. Summarising the above conversation and extending the table to include unsealed we have:
Note that the first row does not allow unsealed. I am not sure whether we should allow unsealed in the last row? Probably not.
|
This is exactly how I implemented what I just pushed. I didn't allow unsealed in the last row. |
@rmn30 I added explanations in the spec PDF |
This prevents us from doing the spill / reload optimisations as library functions with a custom link register. We aren’t currently doing that, but:
It would be nice if we could have a way of requiring that the target is a valid forward sentry (no unsealed). Currently, you can do auipcc and then arbitrary arithmetic to generate a thing that can be used as a function pointer and points to any code (or read-only data that happens to be valid code) in your compartment. I’m not sure how much this matters since you can get arbitrary code execution as long as you start with arbitrary code execution, but this may let you go from a constrained set of gadgets to a rich set. I suspect anyone who can do this can also build a Turing-complete weird machine out of real function pointers, but I’m not sure. Having a way of encoding this would let us move to an ABI where all function pointers came from the import table, without changing the ISA. |
@davidchisnall are you suggesting that rows 3 and 4 of the table should be the same, permitting |
David's suggestion will not merge rows 3 and 4. It will require that Any forward sentry should have some non-null link address, rather than just $cra. Row 2 is necessary to allow null link addresses generated only through inherited sentries, otherwise the semantics of structured interrupts becomes messy (the caller and return-target could be in different interrupt states). |
It doesn't fully guarantee structured programming but it does at least mean any interrupt-disabled function can trust its immediate return, which I agree is a very valuable property. Interrupt-disabled leaf functions are guarantee to return to their caller, which means you can then reason about call sites one layer up, inductively. |
That makes sense. In that case we can't merge rows 3 and 4. Maybe we should make row 4 the same as 2: unsealed or interrupt inheriting only? |
On Wed, May 22, 2024, 10:20 Robert Norton ***@***.***> wrote:
it does at least mean any interrupt-disabled function can trust its
immediate return
That makes sense. In that case we can't merge rows 3 and 4. Maybe we
should make row 4 the same as 2: unsealed or interrupt inheriting only?
My suggestion is to change CD of row 3 to non cnull and remove row 4
(anything not in the first three rows are invalid).
—
… Reply to this email directly, view it on GitHub
<#54 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZURN6KUTTBP3E4UU3E73ZDTHV3AVCNFSM6AAAAABIAEOC7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVGM3DSOJSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Why allow interrupt inheriting (or unsealed) for Row 4? I would prefer to fail for any case other than when the first three rows are not met. Is there a use case for row 4 with unsealed or interrupt inheriting? |
The kind of code size reduction things I mentioned above. |
Sorry I don't get it. The new table I am proposing is as follows:
Any other case is invalid. |
Currently, RISC-V uses JALR with a non-ra link register to do things like outlined prolog sequences. We don’t do this currently, but it would be nice to not prevent it. For this to work, we’d need cjalr with a non-cra link register to work with forward sentries. |
My new table allows that @davidchisnall |
I think that change loses us the guarantee that interrupt-disabled functions have a valid car to return to. I think we want only interrupt-inheriting sentries for link to not-cra. |
It still requires some register with a backwards sentry to return from. If you think there's no use case for a non cra returning function to change interrupt status, we can add that constraint |
Done. Updated the tables appropriately @davidchisnall @rmn30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/cheri_insts.sail
Outdated
* - *cs1* is sealed and is not a sentry. | ||
* - *cs1* is a sentry and *imm* $\ne$ 0. | ||
* - *cs1* is sealed and *imm* $\ne$ 0 | ||
* - Not all of the following combinations hold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be if none of the following hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But what I wrote is also true (distribution of not, or)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am screwing up the English language. Let me fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
This cjalr cra is in some code that is first hit after a cjal and so the cjalr cra returns. The caller then discards the value written back to cra as the link operation in the jump-and-link instruction. It is then used as a call, with cra initialised to a jump address. This was done as a code-size optimisation, but there's actually no need to zero the registers at the start. The code that runs with the (possibly uninitialised) values in registers (most implementations will zero in boot ROM or hardware anyway) all runs with access to the primordial capabilities and so there is no way that this can possibly leak information. The first untrusted code runs after the *second* pass through this block. Simply removing this should shave 10 bytes off the loader without affecting security. More importantly, this is the *only* place where we abuse cjalr in this way. This abuse is incompatible with separating sentries into forward and backward control-flow arcs (see CHERIoT-Platform/cheriot-sail#54), which has a much bigger impact on security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@microsoft-github-policy-service agree company="Google"
|
I see that you did a merge commit. Isn't it better to do a squash merge. This gives a not helpful trail of commit messages |
Sorry, I should have checked the commit history. |
* Belatedly fix StoreCapImmediate docs We no longer throw exceptions for !SL authorities storing !G capabilities, just clear the tag. * Require store local when storing return sentries Backwards control-flow arcs should ideally be confined to the stack and register save areas. Conveniently, we have mechanism in the RTOS to identify exactly those areas of memory, with capabilities bearing `SL` (store local) permission. And, after #54, the ISA has mechanism for identifying backwards control-flow arcs, with the two return sentry types. We should have capability store impose the requirement for `SL` in the authorizing cap if the cap being stored is a return sentry. Credit where it's due: this is Robert's idea, originally suggested in the obviously-wrong-in-retrospect #63 ("Have CJALR create !G sentries?"). Co-authored-by: Robert Norton <[email protected]> * Review feedback Co-authored-by: Robert Norton <[email protected]> --------- Co-authored-by: Robert Norton <[email protected]> Co-authored-by: Robert Norton <[email protected]>
The following is based on discussions with @davidchisnall (portions of our email exchange have been included as is with perhaps slight modifications).
Background:
A typical (non-malicious) function call is as follows:
Even if the callee code is bug-free, the caller can violate callee’s CFI in two ways:
How CHERI prevents CFI:
Cross-compartment and Library functions can be jumped only through sealed capabilities (sentries)
Return address can only be manipulated by the caller to contain one of the following:
(Note that in all the 3 cases above, the manipulated return address has no effect, because the caller could have just jumped to that return address in the code)
CHERIoT inherits all the above benefits. However, because CHERIoT allows manipulating the status of the interrupt through a function call (and function return), by encoding the interrupt type of the callee in the otype of the calling sentry and the interrupt type of the caller in the otype of the return-to-caller sentry, the following attack can occur: A caller calling an interrupt-disabling callee X can set the return address of the callee to be the same interrupt-disabling sentry to X. This means, the callee, on return will call itself all the while operating with interrupts disabled. This will lead to infinite repeated calls to X with interrupts disabled, violating availability.
This attack can be prevented in CHERIoT by adding two new "backwards-edge" sentries and adding more checks on cjalr:
otype 4: Backwards-edge, interrupts-disabled sentry
otype 5: Backwards-edge, interrupts-enabled sentry.
cjalr with a non-cnull link register always creates otype 4 or 5, i.e. backward sentries. (Currently it always creates otype 2 or 3 (forward explicit interrupt control sentries), so this is just a slightly different constant, but in the same place).
(Table derived from @rmn30 's)
Note that this disallows tail calls optimization of functions that change interrupt state. (They’re currently disabled and there’s a FIXME in the compiler to enable them. The compiler change is now to remove the FIXME, and then document this for programmers.)