SPDX-License-Identifier: CC-BY-4.0 copyright:
- SPDX-FileCopyrightText: Copyright 2021-2024 Arm Limited [email protected]
- SPDX-FileCopyrightText: Copyright 2023 Bill Wendling [email protected]
- SPDX-FileCopyrightText: Copyright 2023 Lucian Popescu [email protected]
- SPDX-FileCopyrightText: Copyright 2024 Anders Waldenborg [email protected] title: 'Low-Level Software Security for Compiler Developers' documentclass: report papersize: A4 colorlinks: true link-citations: true github-repo: https://github.com/llsoftsec/llsoftsecbook github-edit-link: https://github.com/llsoftsec/llsoftsecbook/edit/main/book.md discord-server: https://discord.gg/Bm55Z9Ppgn bibliography:
- book.bib header-includes:
- |
\usepackage{makeidx} \makeindex \newcounter{TodoCounter} \usepackage[backgroundcolor=white,linecolor=black]{todonotes} \let\oldtodo\todo \usepackage{bclogo}% \bcpanchant \newcommand{\todospan}[1]{ \stepcounter{TodoCounter} \oldtodo[caption={\arabic{TodoCounter}. #1}]{\bcpanchant #1} } \newcommand{\tododiv}[1]{ \stepcounter{TodoCounter} \oldtodo[inline,caption={\arabic{TodoCounter}. #1}]{\bcpanchant \textit{#1}} } \usepackage{caption} \captionsetup[figure]{labelformat=empty}
...
Compilers, assemblers and similar tools generate all the binary code that processors execute. It is no surprise then that these tools play a major role in security analysis and hardening of relevant binary code.
Often the only practical way to protect all binaries with a particular security hardening method is to have the compiler do it. And, with software security becoming more and more important in recent years, it is no surprise to see an ever increasing variety of security hardening features and mitigations against vulnerabilities implemented in compilers. Indeed, compared to a few decades ago, today's compiler developer is much more likely to implement security features than not.
Furthermore, with the ever-expanding range of techniques implemented, it's very hard to gain a basic understanding of all security features implemented in typical compilers.
This poses a practical problem: compiler developers must be able to work on security hardening features, yet it's hard to gain a good, basic understanding of such compiler features.
There are a lot of materials that explain individual vulnerabilities or attack vectors. There are also lots of presentations explaining specific exploits. But there seems to be a limited set of materials that give a structured overview of all vulnerabilities and exploits against which a code generator plays a role in protecting.
This book aims to provide such a structured, broad overview. It does not necessarily go into full details, instead aiming to give a thorough description of all relevant high-level aspects of attacks, vulnerabilities, mitigations, and hardening techniques. For further details, this book provides pointers to materials with more details on specific techniques.
The purpose of this book is to serve as a guide to every compiler developer that needs to learn about software security relevant to compilers. Even though the focus is on compiler developers, we expect that this book will also be useful to people working on other low-level software.
The idea for this book emerged out of a frustration of not finding a good overview on this topic. Kristof Beyls and Georgia Kouveli, both compiler engineers working on security features, wished a book like this would exist. After not finding such a book, they decided to try and write one themselves. They immediately realized that they do not have all necessary expertise themselves to complete such a daunting task. So they decided to try and create this book in an open source style, seeking contributions from many experts.
As you read this, the book remains unfinished. This book may well never be finished, as new vulnerabilities continue to be discovered regularly. Our hope is that developing the book as an open source project will allow for it to continue to evolve and improve. The open source development process of this book increases the likelihood that it remains relevant as new vulnerabilities and mitigations emerge.
Kristof and Georgia, the initial authors, are far from experts on all possible vulnerabilities. So what is the plan to get high quality content to cover all relevant topics? It is two-fold.
First, by studying specific topics, they hope to gain enough knowledge to write up a good summary for this book.
Second, they very much invite and welcome contributions. If you're interested in potentially contributing content, please go to the home location for the open source project at https://github.com/llsoftsec/llsoftsecbook.
As this book is very much a work in progress, you will notice plenty of "TODO" items throughout this book. [This is an example "TODO" item]{.todo}
We think that keeping the TODOs clearly visible helps both readers and potential contributors. Readers get hints of what kind of further content is still missing. Potential contributors get hints for where we need help to further improve this book.
As a reader, you can also contribute to making this book better. We highly encourage feedback, both positive and constructive criticisms. We prefer feedback through either https://github.com/llsoftsec/llsoftsecbook, for example by raising an issue there, or through our Discord server. Interested potential contributors can reach the LLSoftSec community through the same channels.
::: TODO Add section describing the structure of the rest of the book. :::
Memory access errors describe memory accesses that, although permitted by a program, were not intended by the programmer. These types of errors are usually defined [@Hicks2014] by explicitly listing their types, which include:
- buffer overflow
- null pointer dereference
- use after free
- use of uninitialized memory
- illegal free
Memory vulnerabilities are an important class of vulnerabilities that arise due to these types of errors, and they most commonly occur due to programming mistakes when using languages such as C/C++ \index{C}\index{C++}. These languages do not provide mechanisms to protect against memory access errors by default. An attacker can exploit such vulnerabilities to leak sensitive data or overwrite critical memory locations and gain control of the vulnerable program.
Memory vulnerabilities have a long history. The Morris worm in 1988 was the first widely publicized attack exploiting a buffer overflow. Later, in the mid-90s, a few famous write-ups describing buffer overflows appeared [@AlephOne1996]. Stack buffer overflows were mitigated with stack canaries and non-executable stacks. The answer was more ingenious ways to bypass these mitigations: code reuse attacks, starting with attacks like return-into-libc [@Solar1997]. Code reuse attacks later evolved to Return-Oriented Programming (ROP) [@Shacham2007] and even more complex techniques.
To defend against code reuse attacks, the Address Space Layout Randomization (ASLR) and Control-Flow Integrity (CFI) measures were introduced. This interaction between offensive and defensive security research has been essential to improving security, and continues to this day. Each newly deployed mitigation results in attempts, often successful, to bypass it, or in alternative, more complex exploitation techniques, and even tools to automate them.
Memory safe [@Hicks2014] languages are designed with prevention of such vulnerabilities in mind and use techniques such as bounds checking and automatic memory management. If these languages promise to eliminate memory vulnerabilities, why are we still discussing this topic?
On the one hand, C and C++ \index{C}\index{C++} remain very popular languages, particularly in the implementation of low-level software. On the other hand, programs written in memory safe languages can themselves be vulnerable to memory errors as a result of bugs in how they are implemented, e.g. a bug in their compiler. Can we fix the problem by also using memory safe languages for the compiler and runtime implementation? Even if that were as simple as it sounds, unfortunately there are types of programming errors that these languages cannot protect against. For example, a logical error in the implementation of a compiler or runtime for a memory safe language can lead to a memory access error not being detected. We will see examples of such logic errors in compiler optimizations in a later section.
Given the rich history of memory vulnerabilities and mitigations and the active developments in this area, compiler developers are likely to encounter some of these issues over the course of their careers. This chapter aims to serve as an introduction to this area. We start with a discussion of exploitation primitives, which can be useful when analyzing threat models [Discuss threat models elsewhere in book and refer to that section here [161]{.issue}]{.todo}. We then continue with a more detailed discussion of the various types of vulnerabilities, along with their mitigations, presented in a rough chronological order of their appearance, and, therefore, complexity.
Newcomers to the area of software security may find themselves lost in many blog posts and other publications describing specific memory vulnerabilities and how to exploit them. Two very common, yet unfamiliar to a newcomer, terms that appear in such publications are read primitive and write primitive. In order to understand memory vulnerabilities and be able to design effective mitigations, it's important to understand what these terms mean, how these primitives could be obtained by an attacker, and how they can be used.
An exploit primitive\index{exploit primitive} is a mechanism that allows an attacker to perform a specific operation in the memory space of the victim program. This is done by providing specially crafted input to the victim program.
A write primitive\index{write primitive} gives the attacker some level of write access to the victim's memory space. The value written and the address written to may be controlled by the attacker to various degrees. The primitive, for example, may allow:
- writing a fixed value to an attacker-controlled address, or
- writing to an address consisting of a fixed base and an attacker-controlled offset limited to a specific range (e.g. a 32-bit offset)[Consider describing in more detail why the range limitation matters[162]{.issue}]{.todo}, or
- writing to an attacker-controlled base address with a fixed offset.
Primitives can be further classified according to more detailed properties. See slide 11 of [@Miller2012] for an example.
The most powerful version of a write primitive is an arbitrary write primitive, where both the address and the value are fully controlled by the attacker.
A read primitive\index{read primitive}, respectively, gives the attacker read access to the victim's memory space. The address of the memory location accessed will be controlled by the attacker to some degree, as for the write primitive. A particularly useful primitive is an arbitrary read primitive, in which the address is fully controlled by the attacker.
The effects of a write primitive are perhaps easier to understand, as it has obvious side-effects: a value is written to the victim program's memory. But how can an attacker observe the result of a read primitive?
This depends on whether the attack is interactive or non-interactive [@Hu2016].
- In an interactive attack\index{interactive attack}, the attacker gives malicious input to the victim program. The malicious input causes the victim program to perform the read the attacker instructed it to, and to output the results of that read. This output could be any kind of output, for example a network packet that the victim transmits. The attacker can observe the result of the read primitive by looking at this output, for example parsing this network packet. This process then repeats: the attacker sends more malicious input to the victim, observes the output and prepares the next input. You can see an example of this type of attack in [@Beer2020], which describes a zero-click radio proximity exploit.
- In a non-interactive (one-shot) attack\index{non-interactive (one-shot) attack}, the attacker provides all malicious input to the victim program at once. The malicious input triggers multiple primitives one after the other, and the primitives are able to observe the effects of the preceding operations through the victim program's state. The input could be, for example, in the form of a JavaScript program [@Groß2020], or a PDF file pretending to be a GIF [@Beer2021].
::: TODO The references in this section describe complicated modern exploits. Consider linking to simpler exploits, as well as some tutorial-level material. [163]{.issue} :::
How does an attacker obtain these kinds of primitives in the first place? The details vary, and in some cases it takes a combination of many techniques, some of which are out of scope for this book. But we will be describing a few of them in this chapter. For example a stack buffer overflow results in a (restricted) write primitive when the input size exceeds what the program expected.
As part of an attack, the attacker will want to execute each primitive more than once, since a single read or write operation will rarely be enough to achieve their end goal (more on this later). How can primitives be combined to perform multiple reads/writes?
In the case of an interactive attack, preparing and sending input to the victim program and parsing the output of the victim program are usually done in an external program that drives the exploit. The attacker is free to use a programming language of their choice, as long as they can interact with the victim program in it. Let's assume, for example, an exploit program in C, communicating with the victim program over TCP. In this case, the primitives are abstracted into C functions, which prepare and send packets to the victim, and parse the victim's responses. Using the primitives is then as simple as calling these functions. These calls can be easily combined with arbitrary computations, all written in C, to form the exploit.
For this cycle of repeated input/output interactions to work, the state of the victim program must not be lost between the different iterations of providing input and observing output. In other words, the victim process must not be restarted.
It's interesting to note that while the read/write primitives consist of carefully constructed inputs to the victim program, the attacker can view these inputs as instructions to the victim program. The victim program effectively implements an interpreter unintentionally, and the attacker can send instructions to this interpreter. This is explored further in [@Dullien2020].
In the case of a non-interactive attack, all computation happens within the victim program. The duality of input data and code is even more obvious in this case, as the malicious input to the victim can be viewed as the exploit code. There are cases for which the input is obviously interpreted as code by the victim application as well, as in the case of a JavaScript program given as input to a JavaScript engine. In this case, the read/write primitives would be written as JavaScript functions, which when called have the unintended side-effect of accessing arbitrary memory that a JavaScript program is not supposed to have access to. The primitives can be chained together with arbitrary computations, also expressed in JavaScript.
There are, however, cases where the correspondence between data and code isn't as obvious. For example, in [@Beer2021], the malicious input consists of a PDF file, masquerading as a GIF. Due to an integer overflow bug in the PDF decoder, the malicious input leads to an unbounded buffer access, therefore to an arbitrary read/write primitive. In the case of JavaScript engine exploitation, the attacker would normally be able to use JavaScript operations and perform arbitrary computations, making exploitation more straightforward. In this case, there are no scripting capabilities officially supported. The attackers, however, take advantage of the compression format intricacies to implement a small computer architecture, in thousands of simple commands to the decoder. In this way, they effectively introduce scripting capabilities and are able to express their exploit as a program to this architecture.
So far, we have described read/write primitives. We have also discussed how an attacker might perform arbitrary computations:
- in an external program in the case of interactive attacks, or
- by using scripting capabilities (whether originally supported or introduced by the attacker) in non-interactive attacks.
Assuming an attacker has gained these capabilities, how can they use them to achieve their goals?
The ultimate goal of an attacker may vary: it may be, among other things, getting access to a system, leaking sensitive information or bringing down a service. Frequently, a first step towards these wider goals is arbitrary code execution\index{arbitrary code execution} within the victim process. We have already mentioned that the attacker will typically have arbitrary computation capabilities at this point, but arbitrary code execution also involves things like calling arbitrary library functions and performing system calls.
Some examples of how the attacker may use the obtained primitives:
- Leak information, such as pointers to specific data structures or code, or the stack pointer.
- Overwrite the stack contents, e.g. to perform a ROP attack.
- Overwrite non-control data, e.g. authorization state. Sometimes this step is sufficient to achieve the attacker's goal, bypassing the need for arbitrary code execution.
Once arbitrary code execution is achieved, the attacker may need to exploit additional vulnerabilities in order to escape a process sandbox, escalate privilege, etc. Such vulnerability chaining is common, but for the purposes of this chapter we will focus on:
- Preventing memory vulnerabilities in the first place, thus stopping the attacker from obtaining powerful read/write primitives.
- Mitigating the effects of read/write primitives, e.g. with mechanisms to maintain Control-Flow Integrity (CFI).
A buffer overflow occurs when a read from or write to a data buffer exceeds its boundaries. This typically results in adjacent data structures being accessed, which has the potential of leaking or compromising the integrity of this adjacent data.
When the buffer is allocated on the stack, we refer to a stack buffer overflow. In this section we focus on stack buffer overflows since, in the absence of any mitigations, they are some of the simplest buffer overflows to exploit.
The stack frame of a function includes important control information, such as the saved return address and the saved frame pointer. Overwriting these values unintentionally will typically result in a crash, but the overflowing values can be carefully chosen by an attacker to gain control of the program's execution.
Here is a simple example of a program vulnerable to a stack buffer overflow1:
#include <stdio.h>
#include <string.h>
void copy_and_print(char* src) {
char dst[16];
for (int i = 0; i < strlen(src) + 1; ++i)
dst[i] = src[i];
printf("%s\n", dst);
}
int main(int argc, char* argv[]) {
if (argc > 1) {
copy_and_print(argv[1]);
}
}
In the code above, since the length of the argument is not checked before
copying it into dst
, we have a potential for a buffer overflow.
When looking at code generated for AArch64 with GCC 11.22, the stack layout looks like this:
The exact details of the stack frame layout, including the ordering of variables and the exact control information stored, will depend on the specific compiler version you use and the architecture you compile for.
As can be seen the stack diagram, an overflowing write in function
copy_and_print
can overwrite the saved frame pointer (FP) and link register
(LR) in main
's frame. When copy_and_print
returns, execution continues in
main
. When main
returns, however, execution continues from the address
stored in the saved LR, which has been overwritten. Therefore, when an attacker
can choose the value that overwrites the saved LR, it's possible to control
where the program resumes execution after returning from main
.
Before non-executable stacks were mainstream, a common way to exploit these vulnerabilities would be to use the overflow to simultaneously write shellcode3\index{shellcode} to the stack and overwrite the return address so that it points to the shellcode. [@AlephOne1996] is a classic example of this technique.
The obvious solution to this issue is to use memory protection features of the processor in order to mark the stack (along with other data sections) as non-executable4. However, even when the stack is not executable, more advanced techniques can be used to exploit an overflow that overwrites the return address. These take advantage of code that already exists in the executable or in library code, and will be described in the next section.
Stack canaries are an alternative mitigation for stack buffer overflows. The general idea is to store a known value, called the stack canary, between the buffer and the control information (in the example, the saved FP and LR), and to check this value before leaving the function. Since an overflow that would overwrite the return address is going to overwrite the canary first, a corruption of the return address through a stack buffer overflow will be detected.
This technique has a few limitations: first of all, it specifically aims to protect against stack buffer overflows, and does nothing to protect against stronger primitives (e.g. arbitrary write primitives). Control-flow integrity techniques, which are described in the next section, aim to protect the integrity of stored code pointers against any modification.
Secondly, since a compiler needs to generate additional instructions for
ensuring the canary's integrity, heuristics are usually employed to determine
which functions are considered vulnerable. The additional instructions are then
generated only for the functions that are considered vulnerable. Since
heuristics aren't always perfect, this poses another potential limitation of
the technique. To address this, compilers can introduce various levels of
heuristics, ranging from applying the mitigations only to a small proportion of
functions, to applying it universally. See, for example, the
-fstack-protector
, -fstack-protector-strong
and -fstack-protector-all
options offered by both
GCC and
Clang.
Another limitation is the possibility of leaks of the canary value. The canary value is often randomized at program start but remains the same during the program's execution. An attacker who manages to obtain the canary value at some point might, therefore, be able to reuse the leaked canary value and corrupt control information while avoiding detection. Choosing a canary value that includes a null byte (the C-style string terminator) might help in limiting the damage of overflows coming from string manipulation functions, even when the value is leaked.
Many buffer overflow vulnerabilities result from the use of unsafe library
functions, such as gets
, or from the unsafe use of library functions such as
strcpy
. There is extensive literature on writing secure
C/C++\index{C}\index{C++} code, for example [@Seacord2013] and [@Dowd2006]. A
different approach to limiting the effects of overflows is library function
hardening, which aims to detect buffer overflows and terminate the program
gracefully. This involves the introduction of feature macros like
_FORTIFY_SOURCE
[@Sharma2014].
Finally, it's important to mention that not all buffer overflows aim to overwrite a saved return address. There are many cases where a buffer overflow can overwrite other data adjacent to the buffer, for example an adjacent variable that determines whether authorization was successful, or a function pointer that, when modified, can modify the program's control flow according to the attacker's wishes.
Some of these vulnerabilities can be mitigated with the measures described in
this section, but often more general measures to ensure memory safety or
Control-Flow Integrity are necessary. For example, in addition to the
hardening of specific library functions, compilers can also implement automatic
bounds checking for arrays where the array bound can be statically determined
(-fsanitize=bounds
), as well as various other "sanitizers". We will describe
these measures in following sections.
In the early days of memory vulnerability exploitation, attackers could simply place shellcode\index{shellcode} of their choice in executable memory and jump to it. As non-executable stack and heap became mainstream, attackers started to reuse code already present in an application's binary and linked libraries instead. A variety of different techniques to this effect came to light.
The simplest of these techniques is return-to-libc [@Solar1997]. Instead of
returning to shellcode that the attacker has injected, the return address is
modified to return into a library function, such as system
or exec
. This
technique is simpler to use when arguments are also passed on the stack and
can therefore be controlled with the same stack buffer overflow that is used
to modify the address.
Return-to-libc attacks restrict an attacker to whole library functions. While this can lead to powerful attacks, it has also been demonstrated that it is possible to achieve arbitrary computation by combining a number of short instruction sequences ending in indirect control transfer instructions, known as gadgets\index{gadget}. The indirect control transfer instructions make it easy for an attacker to execute gadgets one after another, by controlling the memory or register that provides each control transfer instruction's target.
In return-oriented programming (ROP)\index{return-oriented programming (ROP)} [@Shacham2007], each gadget performs a simple operation, for example setting a register, then pops a return address from the stack and returns to it. The attacker constructs a fake call stack (often called a ROP chain\index{ROP chain}) which ensures a number of gadgets are executed one after another, in order to perform a more complex operation.
This will hopefully become more clear with an example: a ROP chain for AArch64
Linux that starts a shell, by calling execve
with "/bin/sh"
as an argument.
The prototype of the execve
library
function, which wraps the
exec system call, is:
int execve(const char *pathname, char *const argv[],
char *const envp[]);
For AArch64, pathname
will be passed in the x0
register, argv
will be
passed in x1
, and envp
in x2
. For starting a shell, it is sufficient
to:
- Make
x0
contain a pointer to"/bin/sh"
. - Make
x1
contain a pointer to an array of pointers with two elements:- The first element is a pointer to
"/bin/sh"
. - The second element is zero (
NULL
).
- The first element is a pointer to
- Make
x2
contain zero (NULL
).
This can be achieved by chaining gadgets to set the registers x0
, x1
,
x2
, and then returning to execve
in the C library.
Let's assume we have the following gadgets:
- A gadget that loads
x0
andx1
from the stack:
gadget_x0_x1:
ldp x0, x1, [sp]
ldp x20, x19, [sp, #64]
ldp x29, x30, [sp, #32]
ldr x21, [sp, #48]
add sp, sp, #0x50
ret
- A gadget that sets
x2
to zero, but also clearsx0
as a side-effect:
gadget_x2:
mov x2, xzr
mov x0, x2
ldp x20, x19, [sp, #32]
ldp x29, x30, [sp]
ldr x21, [sp, #16]
add sp, sp, #0x30
ret
::: TODO Explain how these gadgets could result from C/C++ code. The current versions are slightly tweaked by hand to have more manageable offsets. [164]{.issue} :::
Both gadgets also clobber several uninteresting registers, but since
gadget_x2
also clears x0
, it becomes clear that we should use a
ROP chain that:
- Returns to
gadget_x2
, which setsx2
to zero. - Returns to
gadget_x0_x1
, which setsx0
andx1
to the desired values. - Returns to
execve
.
Figure @fig:rop-control-flow shows this control flow.
{ width=30% #fig:rop-control-flow }
{ width=80% #fig:rop-call-stack }
We can achieve this by constructing the fake call stack shown in figure
@fig:rop-call-stack, where "Original frame" marks the frame in which the
address of gadget_x2
has replaced a saved return address that will be loaded
and returned to in the future. As an alternative, an attacker could place this
fake call stack somewhere else, for example on the heap, and use a primitive
that changes the stack pointer's value instead. This is known as stack
pivoting\index{stack pivoting}.
Note that this fake call stack contains NULL bytes, even without considering the exact values of the various return addresses included. An overflow bug that is based on a C-style string operation would not allow an attacker to replace the stack contents with this fake call stack in one go, since C-style strings are null-terminated and copying the fake stack contents would stop once the first NULL byte is encountered. The ROP chain would therefore need to be adjusted so that it doesn't contain NULL bytes, for example by initially replacing the NULL bytes with a different byte and adding some more gadgets to the ROP chain that write zero to those stack locations.
A question that comes up when looking at the stack diagram is "how do we know the addresses of these gadgets"? We will talk a bit more about this in the next section.
ROP gadgets like the ones used here may be easy to identify by visual inspection of a disassembled binary, but it's common for attackers to use "gadget scanner"\index{gadget scanner} tools in order to discover large numbers of gadgets automatically. Such tools can also be useful to a compiler engineer working on a code reuse attack mitigation, as they can point out code sequences that should be protected and have been missed.
Anything in executable memory can potentially be used as a ROP gadget, even if
the compiler has not intended it to be code. This includes literal pools which
are intermingled with code, and, on architectures with variable length
instruction encoding, returning to the middle of an instruction. In a JIT
compiler where the attacker might influence what literals are generated this
can be particularly powerful. For example, on x86, the compiler might have
emitted the instruction mov $0xc35f, %ax
which is encoded as the four bytes
66 b8 5f c3
. If the attacker can divert execution two bytes into that 4-byte
instruction it will execute 5f c3
. Those bytes corresponds to the two single
byte instructions pop %rdi; ret
which is a useful ROP gadget.
Jump-oriented programming (JOP)\index{jump-oriented programming (JOP)} [@Bletsch2011] is a variation on ROP, where gadgets can also end in indirect branch instructions instead of return instructions. The attacker chains a number of such gadgets through a dispatcher gadget\index{dispatcher gadget}, which loads pointers one after another from an array of pointers, and branches to each one in return. The gadgets used must be set up so that they branch or return back to the dispatcher after they're done. This is demonstrated in figure @fig:jop.
::: TODO The gadgets in the figure are made up, chosen to highlight that each gadget can end in a different type of indirect control flow transfer instruction. Consider replacing them with more realistic ones. [165]{.issue} :::
In figure @fig:jop, x4
initially points to the "dispatch table", which has
been modified by the attacker to contain the addresses of the three gadgets
they want to execute. The dispatcher gadget loads each address in the dispatch
table one by one and branches to them. The first gadget loads x0
and x1
from the stack, where the attacker has placed the inputs of their choice. It
then loads its return address, also modified by the attacker so that it points
back to the dispatcher gadget, and returns to it. The dispatcher branches to
the next gadget, which adds x0
and x1
and leaves the result in x0
,
branching back to the dispatcher through another value loaded from the stack
into x2
. The final gadget stores the result of the addition, which remains in
x0
, to the stack, before branching to x2
, which still points to the
dispatcher gadget.
Counterfeit Object-oriented programming (COOP)\index{counterfeit object-oriented programming (COOP)} [@Schuster2015] is a code reuse technique that takes advantage of C++ \index{C++} virtual function calls. A COOP attack takes advantage of existing virtual functions and vtables, and creates fake objects pointing to these existing vtables. The virtual functions used as gadgets in the attack are called vfgadgets. To chain vfgadgets together, the attacker uses a "main loop gadget", similar to JOP's dispatcher gadget, which is itself a virtual function that loops over a container of pointers to C++ objects and invokes a virtual function on these objects. [@Schuster2015] describes the attack in more detail. It is specifically mentioned here as an example of an attack that doesn't depend on directly replacing return addresses and code pointers, like ROP and JOP do. Such language-specific attacks are important to consider when considering mitigations against code reuse attacks, which will be the topic of the next section.
One last example of a code reuse attack that is worth mentioning here is
sigreturn-oriented programming (SROP)\index{sigreturn-oriented programming
(SROP)} [@Bosman2014]. It is a special case of ROP where the attacker creates a
fake signal handler frame and calls sigreturn
. sigreturn
is a system call
on many UNIX-type systems which is normally called upon return from a signal
handler, and restores the state of the process based on the state that has been
saved on the signal handler's stack by the kernel previously, on entry to the
signal handler. The ability to fake a signal handler frame and call
sigreturn
gives an attacker a simple way to control the state of the program.
When discussing mitigations against code reuse attacks, it is important to keep in mind that there are two capabilities the attacker must have for such attacks to work:
- the ability to overwrite return addresses or function pointers
- knowledge of the target addresses to overwrite them with (e.g. libc function entry points).
When code reuse attacks were first described, programs used to contain absolute code pointers, and needed to be loaded at fixed addresses. The stack base was predictable, and libraries were loaded in predictable memory locations. This made code reuse attacks simple, as all of the addresses needed for a successful exploit were easy to discover.
Address space layout randomization (ASLR)\index{ASLR} makes this more difficult by randomizing the positions of the memory areas containing the executable, the loaded libraries, the stack and the heap. ASLR requires code to be position-independent. Given enough entropy, the chance that an attacker would successfully guess one or more addresses in order to mount a successful attack will be greatly reduced.
Does this mean that code reuse attacks have been made redundant by ASLR? Unfortunately, this is not the case. There are various ways in which an attacker can discover the memory layout of the victim program. This is often referred to as an "info leak"\index{info leak} [@Serna2012].
Since we can not exclude code reuse attacks solely by making addresses hard to guess, we need to also consider mitigations that prevent attackers from overwriting return addresses and other code pointers. Some of the mitigations described earlier, like stack canaries and library function hardening, can help in specific situations, but for the more general case where an attacker has obtained arbitrary read and write primitives, we need something more.
Control-flow integrity (CFI)\index{CFI} is a family of mitigations that aim to preserve the intended control flow of a program. This is done by restricting the possible targets of indirect branches and returns. A scheme that protects indirect jumps and calls is referred to as forward-edge CFI\index{forward-edge CFI}, whereas a scheme that protects returns is said to implement backward-edge CFI\index{backward-edge CFI}. Ideally, a CFI scheme would not allow any control flow transfers that don't occur in a correct program execution, however different schemes have varying granularities. They often rely on function type checks or use static analysis (points-to analysis) to identify potential control flow transfer targets. [@Burow2017] compares a number of available CFI schemes based on the precision. For forward-edge CFI schemes, for example, schemes are classified based on whether or not they perform, among others, flow-sensitive analysis, context-sensitive analysis and class-hierarchy analysis.
Clang's CFI includes a variety of forward-edge control-flow integrity checks. These include checking that the target of an indirect function call is an address-taken function of the correct type and checking that a C++ \index{C++} virtual call happens on an object of the correct dynamic type.
For example, assume we have a class A
with a virtual function foo
and a
class B
deriving from A
, and that these classes are not exported to
other compilation modules:
class A {
public:
virtual void foo() {}
};
class B : public A {
public:
virtual void foo() {}
};
void call_foo(A* a) {
a->foo();
}
When compiling with -fsanitize=cfi -flto -fvisibility=hidden
5,
the code for call_foo
would look something like this:
00000000004006b4 <call_foo(A*)>:
4006b4: a9bf7bfd stp x29, x30, [sp, #-16]!
4006b8: 910003fd mov x29, sp
4006bc: f9400008 ldr x8, [x0]
4006c0: 90000009 adrp x9, 400000 <_init-0x558>
4006c4: 91216129 add x9, x9, #0x858
4006c8: cb090109 sub x9, x8, x9
4006cc: d1004129 sub x9, x9, #0x10
4006d0: 93c91529 ror x9, x9, #5
4006d4: f100093f cmp x9, #0x2
4006d8: 540000a2 b.cs 4006ec <call_foo(A*)+0x38>
4006dc: f9400108 ldr x8, [x8]
4006e0: d63f0100 blr x8
4006e4: a8c17bfd ldp x29, x30, [sp], #16
4006e8: d65f03c0 ret
4006ec: d4200020 brk #0x1
This code looks complicated, but what it does is check that the virtual table
pointer (vptr) of the argument points to the vtable of A
or of B
, which are
stored consecutively and are the only allowed possibilities. The checks
generated for different types of control-flow transfers are similar.
Another implementation of forward-edge CFI is Windows Control Flow Guard, which only allows indirect calls to functions that are marked as valid indirect control flow targets.
Clang also implements a backward-edge CFI scheme known as Shadow Stack\index{shadow stack}. In Clang's implementation, a separate stack is used for return addresses, which means that stack-based buffer overflows cannot be used to overwrite return addresses. The address of the shadow stack is randomized and kept in a dedicated register, with care taken so that it is never leaked, which means that an arbitrary write primitive cannot be used against the shadow stack unless its location is discovered through some other means.
As an example, when compiling with -fsanitize=shadow-call-stack -ffixed-x18
6, the code generated for the main
function from the
earlier stack buffer overflow example will look
something like:
main:
cmp w0, #2
b.lt .LBB1_2
str x30, [x18], #8
stp x29, x30, [sp, #-16]!
mov x29, sp
ldr x0, [x1, #8]
bl copy_and_print
ldp x29, x30, [sp], #16
ldr x30, [x18, #-8]!
.LBB1_2:
mov w0, wzr
ret
You can see that the shadow stack address is kept in x18
. The return address
is also saved on the "normal" stack for compatibility with unwinders, but
it's not actually used for the function return.
In addition to software implementations, there are a number of hardware-based CFI implementations. A hardware-based implementation has the potential to offer improved protection and performance compared to an equivalent software-only CFI scheme.
One such example is Pointer Authentication\index{Pointer Authentication} [@Rutland2017], an Armv8.3 feature, supported only in AArch64 state, that can be used to mitigate code reuse attacks. Pointer Authentication introduces instructions that generate a pointer signature, called a Pointer Authentication Code (PAC), based on a key and a modifier. It also introduces matching instructions to authenticate this signature. Incorrect authentication leads to an unusable pointer, that will cause a fault when used 7. The key is not directly accessible by user space software.
Pointers are stored as 64-bit values, but they don't need all of these bits to describe the available address space, so a number of bits in the top of each pointer are unused. The unused bits must be all ones or all zeros, so we refer to them as extension bits\index{pointer extension bits}. Pointer Authentication Codes are stored in those unused extension bits of a pointer. The exact number of PAC bits depends on the number of unused pointer bits, which varies based on the configuration of the virtual address space size.8
Clang and
GCC both use Pointer
Authentication for return address signing, when compiling with the
-mbranch-protection=pac-ret
flag. When compiling with Clang using this flag,
the main
function from the earlier stack buffer overflow
example looks like:
main:
cmp w0, #2
b.lt .LBB1_2
paciasp
stp x29, x30, [sp, #-16]!
ldr x0, [x1, #8]
mov x29, sp
bl copy_and_print
ldp x29, x30, [sp], #16
autiasp
.LBB1_2:
mov w0, wzr
ret
Notice the paciasp
and autiasp
instructions: paciasp
computes a PAC for
the return address in the link register (x30
), based on the current value of
the stack pointer (sp
) and a key. This PAC is inserted in the extension bits
of the pointer. We then store this signed version of the link register on the
stack. Before returning, we load the signed return address from the stack, we
execute autiasp
, which verifies the PAC stored in the return address, again
based on the value of the key and the value of the stack pointer (which at this
point will be the same as when we signed the return address). If the PAC is
correct, which will be the case in normal execution, the extension bits of the
address are restored, so that the address can be used in the ret
instruction.
However, if the stored return address has been overwritten with an address with
an incorrect PAC, the upper bits will be corrupted so that subsequent uses of
the address (such as in the ret
instruction) will result in a fault.
By making sure we don't store any return addresses without a PAC, we can significantly reduce the effectiveness of ROP attacks: since the secret key is not retrievable by an attacker, an attacker cannot calculate the correct PAC for a given address and modifier, and is restricted to guessing it. The probability of success when guessing a PAC depends on the exact number of PAC bits available in a given system configuration. However, authenticated pointers are vulnerable to pointer substitution attacks\index{pointer substitution attack}, where a pointer that has been signed with a given modifier is replaced with a different pointer that has also been signed with the same modifier.
Another backward-edge CFI scheme that uses Pointer Authentication instructions is PACStack [@Liljestrand2021], which chains together PACs in order to include the full context (all of the previous return addresses in the call stack) when signing a return address. [Add more references to relevant research [166]{.issue}]{.todo}
Pointer Authentication can also be used more widely, for example to implement a forward-edge CFI scheme, as is done in the arm64e ABI [@McCall2019]. The Pointer Authentication instructions, however, are generic enough to also be useful in implementing more general memory safety measures, beyond CFI. [Mention more Pointer Authentication uses in later section, and add link here [167]{.issue}]{.todo}
Branch Target Identification
(BTI)
\index{BTI}, introduced in Armv8.5, offers coarse-grained forward-edge
protection. With BTI, the locations that are targets of indirect branches have
to be marked with a new instruction, BTI
. There are four different types of
BTI instructions that permit different types of indirect branches (indirect
jump, indirect call, both, or none). An indirect branch to a non-BTI
instruction or the wrong type of BTI instruction will raise a Branch Target
Exception.
Both Clang and GCC support generating BTI instructions, with the
-mbranch-protection=bti
flag, or, to enable both BTI and return address
signing with Pointer Authentication, -mbranch-protection=standard
.
Two aspects of BTI can simplify its deployment: individual pages can be marked as guarded or unguarded, with BTI checks as described above only applying to indirect branches targeting guarded pages. In addition to this, the BTI instruction has been assigned to the hint space, therefore it will be executed as a no-op in cores that do not support BTI, aiding its adoption.
When implementing CFI measures like the ones described here, it is important to be aware of known weaknesses that affect similar schemes. [@Conti2015] describes how CFI implementations can suffer when certain registers are spilled on the stack, where they could be controlled by an attacker. For example, if a register that contains a function pointer that has just been validated gets spilled, the check can effectively be bypassed by overwriting the spilled pointer.
Having discussed various mitigations against code reuse attacks, it's time to turn our attention to a different type of attacks, which do not try to overwrite code pointers: attacks against non-control data, which will be the topic of the next section.
In the previous sections, we have focused on subverting control flow by overwriting control data\index{control data}, which are used to change the value of the program counter, such as return addresses and function pointers. Since these types of attacks are prominent, many mitigations have been designed with the goal of maintaining control-flow integrity. Non-control data attacks\index{non-control data attacks}, also known as data-only attacks\index{data-only attacks}, can completely bypass these mitigations, since the data they modify is not the control data that these mitigations protect.
Non-control data attacks can range from very simple attacks targeting a single piece of data to very elaborate attacks with very high expressiveness [@Beer2021]. A very simple example may look something like this:
// Returns zero for failure, non-zero for success.
int authenticate() {
int authenticated = 0;
char passphrase[10];
if (fgets(passphrase, 20, stdin)) { // buffer overflow
if (!strcmp(passphrase, "secret\n")) {
authenticated = 1;
}
}
return authenticated;
}
The example shows a simplified9 function that reads a
passphrase from a user, compares it with a known value and sets an integer
stack variable to indicate whether "authentication" was successful or not. The
function contains a very obvious buffer overflow, as the string length limit
passed to fgets
does not match the buffer size.
Figure @fig:non-control-data-attack shows the stack frame layout for this
function when the code is compiled for AArch64 with Clang 10.010. As the figure
shows, an overflow of passphrase
will overwrite authenticated
, setting it
to a non-zero value, even though the passphrase was incorrect. The
authenticate
function will then return a non-zero value, incorrectly
indicating authentication success.
{ width=60% #fig:non-control-data-attack }
For many more simple examples of data-only attacks that can occur in real applications, see [@Chen2005]. Although this makes it clear that data-only attacks are a real issue, it leaves open a very important question: what are the limits of such attacks? It is tempting to assume that data-only attacks are somehow inherently limited, however it has been demonstrated in [@Hu2016] that they can, in fact, be very expressive. [@Hu2016] describes Data-Oriented Programming (DOP), a general method for building data-only attacks against a vulnerable program, starting from a known memory error in the program11.
The authors of [@Hu2016] describe a small language called MINDOP, with a virtual instruction set and virtual registers. The virtual registers of MINDOP correspond to memory locations. The MINDOP instructions correspond to operations on these virtual registers, for example loading a value into a virtual register, storing a value from a virtual register, arithmetic operations and even conditional and unconditional jumps. The authors show how to identify gadgets in the code that implement the various MINDOP instructions and are reachable from memory errors, and how those gadgets can be stitched together with the help of dispatcher gadgets, the role of which is specifically to chain gadgets together.
Stitching gadgets together is simpler for interactive attacks, where the attacker can keep providing malicious input to trigger the initial memory error and a certain chain of gadgets, as many times as needed. For non-interactive attacks, the MINDOP jump operations are required as well, used in conjunction with a memory location that provides a virtual program counter.
The process of creating a DOP attack is not so simple and not fully automated. Related literature [@Ispoglou2018] focuses on automating data-only attacks.
When reading write-ups on recent security issues, instead of terminology related to data-oriented gadgets, you are more likely to encounter the term "primitive", which has been described in an earlier section. These concepts are related: an arbitrary read primitive, for example, can be produced by chaining a (possibly large) number of DOP gadgets. Talking about primitives offers a nicer level of abstraction, as it tends to be simpler to reason in terms of higher-level operations instead of many small pieces of code that need to be stitched together to perform the operations.
To summarize, data-only attacks are a significant concern. As most of the mitigation techniques we have seen so far are control-flow oriented, they are by design inadequate to protect against this different type of attacks. In the next section, we will look at what we can do to address them at their source: memory errors.
We have so far discussed how languages that are not memory safe, like C and C++, are vulnerable to memory errors and therefore exploitation. In this section, we will discuss tools that are available to C/C++ \index{C}\index{C++}programmers to help them detect vulnerabilities that can lead to memory errors.
Sanitizers \index{sanitizers} are tools that detect bugs during program execution. Sanitizers usually have two components: a compiler instrumentation part that introduces the new checks, and a runtime library part. They are often too expensive to run in production mode, as they tend to increase execution time and memory usage. They are commonly used during testing of an application, frequently in combination with fuzzers\index{fuzzing}12.
A very popular sanitizer is Address Sanitizer (ASan)\index{AddressSanitizer (ASan)}. It aims to detect various memory errors. These include out-of-bounds accesses, use-after-free, double-free and invalid free13. There are Address Sanitizer implementations for both GCC \index{GCC} and Clang\index{Clang}, but we will focus on the Clang implementation here.
ASan uses shadow memory\index{shadow memory} to keep track of the state of the application's memory. Each byte of shadow memory records information on 8 bytes of the application's memory. It represents how many of the 8 bytes are addressable. When none of the bytes are addressable, it encodes additional details (whether the 8 bytes are out-of-bounds stack, out-of-bounds heap, freed memory, and so on). Requiring one byte of shadow memory for every 8 bytes of application memory means that ASan needs to reserve one-eighth of the application's virtual address space [@Serebryany2012]. Shadow memory is allocated in one contiguous chunk, which keeps mapping application memory to shadow memory simple.
ASan's runtime library replaces memory allocation functions like
malloc
\index{malloc} and free
\index{free} with its own specialized
versions. malloc
introduces redzones\index{redzone} before and after each
allocation, which are marked as unaddressable. free
marks the entire
allocation as unaddressable and places it in quarantine, so that it doesn't get
reallocated for a while (in a FIFO basis). This allows for detecting
use-after-free. The runtime library also handles management of the shadow
memory.
ASan's code instrumentation in the compiler introduces redzones around each stack array allocation, and around globals. It then instruments loads and stores to check whether the accessed memory is addressable, based on the information stored in the shadow memory, and reports an error if unaddressable memory is accessed.
ASan doesn't produce false positives and is easy to use. It requires compiling
and linking a program with the -fsanitize=address
option. It is used in
practice for testing large
projects.
There is a similar tool for dynamic memory error detection in the Linux kernel,
KASAN\index{KASAN}.
ASan's biggest drawback is its high runtime overhead and memory usage, due to the quarantine, redzones and shadow memory. Hardware-assisted AddressSanitizer (HWASAN) \index{HWASAN} works similarly to ASan, but with partial hardware assistance can result in lower memory overheads, at the cost of being less portable.
On AArch64\index{AArch64}, HWASAN uses Top-Byte Ignore (TBI)\index{Top-Byte Ignore (TBI)}. When TBI is enabled, the top byte of a pointer is ignored when performing a memory access, allowing software to use that top byte to store metadata, without affecting execution. Each allocation is aligned to 16 bytes and each 16-byte chunk of memory (called "granule") is randomly assigned an 8-bit tag. The tag is stored in shadow memory and is also placed in the top byte of the pointer to the object. Memory loads and stores are then instrumented to check that the tag stored in the pointer matches the tag stored in memory, and report an error when a mismatch happens. [Add diagram to demonstrate how HWASAN works [168]{.issue}]{.todo}
For granules shorter than 16 bytes, the value stored in shadow memory is not the actual tag, but the length of the granule. The actual tag is stored at the last byte of the granule itself. For tags in shadow memory with values between 1 and 15, HWASAN checks that the access is within the bounds of the granule and the pointer tag matches the tag stored at the last byte of the granule.
HWASAN is also easy to use, and simply requires compiling and linking an
application with the -fsanitize=hwaddress
flag.
MemTagSanitizer\index{MemTagSanitizer} goes one step further and uses the Armv8.5-A Memory Tagging Extension (MTE)\index{Memory Tagging Extension (MTE)}. With MTE, the tag checking is done automatically by hardware, and an exception is raised on mismatch. MTE's granule size is 16 bits, whereas tags are 4-bit. [Consider adding a whole section on MTE and its applications [169]{.issue}]{.todo}
UndefinedBehaviorSanitizer (UBSan) \index{UBSan}detects undefined behavior during program execution, for example array out-of-bounds accesses for statically determined array bounds, null pointer dereference, signed integer overflow and various kinds of integer conversions that result in data loss. Although some of these checks are not directly related to memory errors, these kinds of errors can lead to incorrect pointer arithmetic, incorrect allocation sizes, and other issues that lead to memory errors, so it is important to detect them and address them.
UBSan's documentation describes the full list of available checks. The majority
of these checks are enabled with the -fsanitize=undefined
flag, but
there are also other useful groupings of checks, for example -fsanitize=integer
for checks related to integer conversions and arithmetic.
There are many other sanitizers, more than can reasonably be covered in this section. For the interested reader, we list a few more:
- MemorySanitizer\index{MemorySanitizer}: detects uninitialized reads.
- ThreadSanitizer\index{ThreadSanitizer}: detects data races.
- GWP-ASan\index{GWP-ASan}: detects use-after-free and heap buffer overflows, with low overhead that makes it suitable for production environments. It performs checks only on a sample of allocations.
::: TODO Describe other mechanisms for detecting memory errors, both software-based (static analysis, library and buffer hardening) and hardware-based, e.g. PAuth-based pointer integrity schemes, MTE etc [170]{.issue} :::
Making sure that memory accesses happen within the bounds of each object's allocation is a very important part of memory safety. This is usually described with the term "spatial memory safety"\index{spatial memory safety}. Out-of-bounds accesses result in restricted read/write primitives\index{read primitive}\index{write primitive}14. An attacker can often easily convert these into arbitrary read/write primitives. For example, this can be achieved by overwriting pointer fields in allocations following the object that was the target of the problematic memory access.
The C and C++ memory languages do not, as a general rule, perform bounds checking15. This is one of the sources of memory errors in C/C++ programs. However, compilers have a history of introducing bounds checks, even though the language does not require them, in an effort to improve security of existing C/C++ codebases.
One of the simplest compiler options is -Warray-bounds
, which warns when an
array access is always out of bounds. This is therefore restricted to arrays
with statically known size. This option is supported by both GCC and Clang.
Another option supported by both compilers is -fsanitize=bounds
, included in
UBSan, which checks the bounds for accesses to statically sized
arrays at runtime. This handles more cases than -Warray-bounds
, as it can
also check accesses to dynamic indices. However, it's still limited, as it
cannot perform bounds checks on dynamically sized arrays, and it is still
restricted to array bounds checking. A more comprehensive solution would
also cover pointers in general, especially if pointer arithmetic is
performed.
You may notice that there is a bit of overlap between the bounds checks
introduced by -fsanitize=bounds
and the Address Sanitizer. Although the scope
of -fsanitize=bounds
is restricted to statically sized arrays, it's
interesting to note that it can still catch intra-object
overflows\index{intra-object overflow} on array member accesses that the
Address Sanitizer would not, because the access is still technically within the
allocation. For example, given the following code:
struct foo {
int a[6];
int b;
};
int get(struct foo *x, int i) {
return x->a[i];
}
a call to get(f, 6)
will give an error with -fsanitize=bounds
,
but not with -fsanitize=address
.
Clang and GCC also support two builtin functions that return information on
the size of a variable. __builtin_object_size
can be used for objects of
statically known size and always evaluates at compile time, whereas
__builtin_dynamic_object_size
can also propagate dynamic information from
allocation functions that have been marked with the alloc_size
function
attribute.
These two builtins can be then used to introduce bounds checks in user or
library code. For example, the _FORTIFY_SOURCE
macro
instructs glibc
to introduce bounds checks in various string and memory
manipulation functions, such as memcpy
. The number of checks increases as the
value of the macro increases (the used values are currently 1-3). For example,
the lower two levels won't use the __builtin_dynamic_object_size
builtin,
as it has a runtime overhead, additional to that of the checks themselves.
In order to support bounds checking for dynamically sized arrays, a recent
proposal for GCC and
Clang proposes the addition of a struct
member attribute, element_count
. This attribute will apply to flexible array
members in structs,
indicating another member of the struct that expresses the array's length.
The -fbounds-safety
proposal goes a bit further, introducing a similar annotation that can be
applied to pointers more generally. The proposal also aims to reduce the
annotation burden placed on programmers by only requiring the annotations at
ABI\index{Application
binary interface (ABI)} boundaries16. Local variables which do
not cross ABI boundaries are implicitly converted to use wide pointers. These
wide pointers store bounds information alongside the original pointer.
There are also hardening efforts focusing on C++ codebases. For example, the
libc++ hardening modes
enable a number of assertions that aim to catch undefined behaviour in the
library. The C++ Buffer Hardening
proposal aims to
extend this library hardening. The proposal will also introduce a programming
model in which all pointer arithmetic is considered unsafe. Pointer arithmetic
will have to be replaced with alternatives from the C++ library, for example
std::array
. The implementation of these alternatives in the hardened library
will include bounds checks.
Successfully using bounds checking compiler features for a large codebase requires substantial effort. An example of this is refactoring the Linux kernel to use bounds checks for flexible arrays, as described in [@Cook2023].
There are also hardware-based mitigations for violations of spatial memory safety. For example, CHERI introduces capabilities\index{capability} to conventional Instruction Set Architectures. Capabilities combine a virtual address with metadata that describes its corresponding bounds and permissions. Capabilities cannot be forged, and can thus provide very strong guarantees. Arm has developed a prototype architecture that adapts CHERI, as well as a prototype SoC and development board, as part of the Arm Morello Program.
Of course, another approach to mitigating spatial memory safety vulnerabilities is using a language that has been designed with spatial memory safety in mind. Such languages make sure that all memory accesses are checked, either at compile-time or runtime. For example, the Rust programming language introduces bounds checks whenever the compiler cannot prove that an access is within bounds17. There are many other memory safe languages, with different characteristics. One example is JavaScript, a dynamically typed, usually JIT-compiled language. We'll discuss some of the issues that arise when implementing support for such a language in the next section.
Compiler correctness is obviously very important, as miscompilation creates buggy programs even when the source code has no bugs. What might be less obvious is that these bugs can have security implications. For example, they can introduce memory safety errors in languages that are otherwise memory safe. In some cases, a bug might leave most programs unaffected and not cause security issues in practice before it is detected and fixed. This is, of course, assuming that the bug has not been intentionally injected in the compiler.
Compiler bugs are an interesting source of security issues for just-in-time (JIT) compilers\index{JIT compilers}18. JIT compilation is often used in programs that receive source code as input during program execution, for example in web browsers, for executing JavaScript code included in web pages. In this context, the input to the JIT compiler comes from arbitrary websites and is therefore untrusted. Bugs in such JIT compilers can lead to compromise of the whole program (here, the browser) if a malicious input (e.g. coming from a malicious website) deliberately triggers miscompilation in order to break memory safety of the language being implemented.
For this section, we focus on JavaScript, which is a dynamically typed, memory safe language, but the concerns we discuss also apply to other languages that are compiled dynamically.
Without statically known types, in order to optimize JavaScript code, JavaScript engines resort to type profiling [@Pizlo2020], recording the types encountered while executing code. These types are then used during optimization, which speculates that the same types will be encountered in future runs of the code, and inserts checks to validate that these assumptions about types still hold. When a check fails, the optimized code is replaced by unoptimized code that can handle all types, a process known as deoptimization\index{deoptimization} or on-stack replacement (OSR)\index{on-stack replacement (OSR)}. Deoptimization makes sure that the state of the deoptimized function is recreated correctly for the point of execution where the type check failed.
For example, a function such as:
function foo(x, y) {
return x + y;
}
will return a number when x
and y
are numbers, but a string when either is
a string. An optimizing compiler can use the results of profiling to generate
optimized code. For example, when both arguments are integers during profiling,
it can generate code that looks like this in pseudocode:
foo:
if x not integer, deoptimize
if y not integer, deoptimize
result = x + y
if overflowed, deoptimize
return result
You may be wondering how the type checks are implemented, and this is closely related to the representation of values in a JavaScript engine [@Wingo2011]. In short, JavaScript engines use specific bit patterns to indicate whether a value should be interpreted as a pointer, or as an integer or floating-point value. For example, the V8 JavaScript engine uses the least significant bit to denote that a value is a pointer, otherwise it is a small integer (which needs to be shifted down to access its value). Pointers then point to objects that contain a hidden class member which is used for type checking.
In addition to the values for which typing information is gathered during
profiling, optimizing JavaScript compilers propagate the profiled types to
dependent values. For example if a value x
is expected to be a string, and we
check this assumption, then x + 1
will also be a string (and no additional
check is needed in this case). In addition to simple type propagation, they
usually perform range analysis\index{range analysis} to determine as precise a
range for a value as possible, which is useful for bounds check
elimination\index{bounds check elimination}.
Bounds check elimination (BCE)\index{bounds check elimination} is a common optimization in languages that perform bounds checks on array accesses to ensure every accessed index is within the bounds of the array. BCE gets rid of bounds checks when they are proven to be redundant, e.g. when the array access uses a constant index that's known to be smaller than the length of the array. See here for details on how out-of-bounds array accesses behave in JavaScript.
Range analysis is a good example of an analysis where a JIT compiler bug can introduce a vulnerability. Incorrect range analysis results can be used by bounds check elimination to incorrectly eliminate bounds checks that should actually have been maintained in the optimized code. For example, for the following function:
function foo(x) {
y = bar(x);
var a = [0, 1, 2];
return a[y];
}
If range analysis decides that the value of y
is in the range [0, 2]
, but
in reality the value is in the range [0, 3]
, the bounds check for the access
a[y]
can be eliminated incorrectly, assuming the access is in-bounds.
[@Glazunov2021] lists a few examples of similar hypothetical vulnerabilities,
along with examples of vulnerabilities of this type that affected widely-used
JavaScript engines.
The type of bug described above provides an attacker with a limited read or write primitive, as a linear overflow of the array allocation occurs. The attacker can then build on this primitive to get to an arbitrary read/write primitive. As JIT compilers generate executable code at runtime, they often use memory that is writable and executable at the same time. Such memory is very useful to attackers, who can use an arbitrary write primitive to copy their payload into this code memory, and then jump to it. Writable and executable memory, therefore, makes JITs lucrative targets for attackers.
Bugs related to range analysis are just one of the common types of bugs encountered in a JavaScript engine. [@Groß2022] lists some other common types of bugs that result in violations of temporal and spatial memory safety, as well as type safety, in JavaScript engines.
How can we defend against such vulnerabilities? There are several complementary approaches, for example:
- Use fuzzing to discover compiler bugs. For JavaScript, a useful fuzzing tool is Fuzzilli.
- Be more conservative when it comes to error-prone compiler optimizations such as bounds check elimination. For example, the V8 JavaScript engine has introduced hardening of bounds checks against typer bugs 19.
- Instead of trying to prevent compiler (and other) bugs, assume they will be present and introduce mitigations that prevent attackers from building arbitrary read/write primitives on top of the initial limited primitives that bugs provide. For example, for 64-bit architectures, V8 implements a sandbox, built on top of pointer compression \index{pointer compression}. With pointer compression, pointers are represented by 32-bit indices off a base pointer instead of as full 64-bit values. By making sure that all pointers inside the sandbox (where the JavaScript heap is located) are compressed, and that compressed pointers always point inside the sandbox, a limited primitive that allows overwriting memory within the sandbox cannot be used to build an arbitrary read/write primitive by overwriting pointer values.
- Preventing code memory from being executable and writable at the same time is also desirable. This is known as W^X \index{W^{}X}. A naive implementation of W^X that simply switches memory permissions based on page tables temporarily is not enough to prevent attackers from writing to code memory [@Song2015], when multiple threads are involved. A more effective solution would use a separate compilation process, which is the only process that has write access to the JIT's code memory. Alternatively, some architectures provide special features that can restrict page-based memory permissions from userspace, effectively allowing permissions to be different for different threads. Such features can also be of use in implementing W^X. For AArch64, this feature is called permission overlays.
In this section, we have discussed JIT compiler security and described JavaScript compiler bugs that lead to vulnerabilities. Although we haven't focused on the details of JavaScript exploitation, an interested reader could take a look at [@saelo2021a] and [@saelo2021b].
Side-channels and covert channels are communication channels between two entities in a system, where the entities should not be able to communicate that way.
A covert channel\index{covert channel} is a channel where both entities intend to communicate. A side-channel\index{side-channel} is a channel where one entity is the victim of an attack using the channel.
The difference between a covert channel and a side-channel is whether both entities intend to communicate. In a side-channel attack, the entity not intending to communicate is called the victim\index{victim}. The other entity is sometimes called the spy\index{spy}.
As we focus on attacks in this book, we'll mostly use the term side-channels in the rest of this chapter.
The next few sections describe a variety of side-channels. Each section focusses on leakage through a specific so-called micro-architectural aspect\index{micro-architectural}, such as execution time, cache state or branch predictor state.
An implementation of a cryptographic algorithm can leak information about the data it processes if its run time is influenced by the value of the processed data. Attacks making use of this are called timing attacks\index{timing attacks}.
The main mitigation against such attacks consists of carefully implementing the algorithm such that the execution time remains independent of the processed data. This can be done by making sure that both:
a) The control flow, i.e. the trace of instructions executed, does not change depending on the processed data. This guarantees that every time the algorithm runs, exactly the same sequence of instructions is executed, independent of the processed data.
b) The instructions used to implement the algorithm are from the subset of instructions for which the execution time is known to not depend on the data values it processes.
For example, in the Arm architecture, the Armv8.4-A DIT extension guarantees that execution time is data-independent for a subset of the AArch64 instructions.
By ensuring that the extension is enabled and only instructions in the subset are used, data-independent execution time is guaranteed.
At the moment, we do not know of a compiler implementation that actively helps to guarantee both (a) and (b).
Using compiler techniques to transform a function such that it respects property (a) is an active research area. [@Wu2018] provides a method to convert a program such that it respects property (a), albeit by potentially introducing unsafe memory accesses. [@Soares2021] improves on that result by not introducing unsafe memory accesses, albeit by potentially needing to change the interface of the transformed function.[Also discuss the techniques implemented in the Constatine compiler [172]{.issue}]{.todo} [Also discuss the Jasmin language and compiler 1 2 [213]{.issue}]{.todo}
A great reference giving practical advice on how to achieve (a), (b) and more security hardening properties specific for cryptographic kernels is found in [@Pornin2018].
As discussed in [@Pornin2018], when implementing cryptographic algorithms, you also need to keep cache side-channel attacks in mind, which are discussed in the section on cache side-channel attacks.
Caches\index{cache} are used in almost every computing system. They are small memories that are much faster than the main memory. They automatically keep the most frequently used data, so that the average memory access time improves.
When processes share a cache, various techniques exist to establish a covert communication channel. These let the processes communicate through memory accesses even when they do not share any memory location. We first describe how caches work before exploring these techniques.
There is a wide variety in CPU cache micro-architecture details, but the main characteristics that are important to set up a covert channel tend to be similar across most popular implementations.
Caches are small and much faster memories than the main memory that aim to keep a copy of the data at the most frequently accessed main memory addresses. The set of addresses that are used most frequently changes quickly over time as a program executes. Therefore, the addresses that are present in CPU caches also evolve quickly over time. The content of the cache may change with every executed read or write instruction.
On every read and write instruction, the cache micro-architecture looks up if the data for the requested address happens to be present in the cache. If it is, the CPU can continue executing quickly; if not, dependent operations will have to wait until the data returns from the slower main memory. A typical access time is 3 to 5 CPU cycles for the fastest cache on a CPU versus hundreds of cycles for a main memory access.\index{memory access time} When data is present in the cache for a read or write, it is said to be a cache hit\index{cache hit}. Otherwise, it's called a cache miss\index{cache miss}.
Most systems have multiple levels of cache\index{multi-level cache}, each with a different trade-off between cache size\index{cache size} and access time\index{cache access time}. Some typical characteristics might be:
- L1 (level 1) cache, 32KB in size, with an access time of 4 cycles.
- L2 cache, 256KB in size, with an access time of 10 cycles.
- L3 cache, 16MB in size, with an access time of 40 cycles.
- Main memory, gigabytes in size, with an access time of more than 100 cycles.
If data is not already present in a cache layer, it is typically stored there after it has been fetched from a slower cache level or main memory. This is often a good decision to make as there's a high likelihood the same address will be accessed by the program soon after. This high likelihood is known as the principle of locality\index{principle of locality}\index{locality of reference}.
Data is stored and transferred between cache levels in blocks of aligned memory. Such a block is called a cache block\index{cache block} or cache line\index{cache line}. Typical sizes are 32, 64 or 128 bytes per cache line.
When data that wasn't previously in the cache needs to be stored in the cache, room has to be made for it by removing, or evicting\index{cache eviction}, some other address/data from it. How that choice gets made is decided by the cache replacement policy\index{cache replacement policy}. Popular replacement algorithms are Least Recently Used (LRU)\index{LRU replacement policy}, Random\index{random replacement policy} and pseudo-LRU\index{pseudo-LRU replacement policy}. As the names suggest, LRU evicts the cache line that is least recently used; random picks a random cache line; and pseudo-LRU approximates choosing the least recently used line.
If a cache line can be stored in all locations available in the cache, the cache is fully-associative\index{fully-associative cache}. Most caches are however not fully-associative, as it's too costly to implement. Instead, most caches are set-associative\index{set-associative cache}. In an N-way set-associative cache, a specific line can only be stored in one of N cache locations. For example, if a line can potentially be stored in one of 2 locations, the cache is said to be 2-way set-associative. If it can be stored in one of 4 locations, it's called 4-way set-associative, and so on. When an address can only be stored in one location in the cache, it is said to be direct-mapped\index{direct-mapped cache}, rather than 1-way set-associative. Typical organizations are direct-mapped, 2-way, 4-way, 8-way, 16-way or 32-way set-associative.
The set of cache locations that a particular cache line can be stored at is called a cache set\index{cache set}.
For some cache covert channels, it is essential to know exactly how a memory address maps to a specific cache set.
{ width=100% #fig:cache-indexing}
Specific bits in the memory address are used for different cache indexing
purposes, as illustrated in figure @fig:cache-indexing. The least-significant
For direct-mapped and fully-associative caches, the mapping of an address to
cache locations also works as described above. In fully-associative caches the
number of cache sets is 1, so
::: TODO Also explain cache coherency \index{cache coherency}? [173]{.issue} ::: ::: TODO Also say something about TLBs and prefetching? [174]{.issue} :::
Cache side-channels typically work by the spy determining whether a memory access was a cache hit or a cache miss. From that information, the spy may be able to deduce bits of data that only the victim should have access to.
Let's illustrate this by describing a few well-known cache side-channels:
In a so-called Flush+Reload\index{Flush+Reload} attack[@Yarom2014], the spy process shares memory with the victim process. The attack works in 3 steps:
- The Flush step: The spy flushes a specific address from the cache.
- The spy waits for some time to give the victim time to potentially access that address, resulting in bringing it back into the cache.
- The Reload step: The spy accesses the address and measures the access time. A short access time means the address is in the cache; a long access time means it's not in the cache. In other words, a short access time means that in step 2 the victim accessed the address; a long access time means it did not access the address.
::: TODO Should there be a more elaborate example with code that demonstrates in more detail how a flush+reload attack works? [175]{.issue} :::
Knowing if a victim accessed a specific address can leak sensitive information. Such as when accessing a specific array element depends on whether a specific bit is set in secret data. For example, [@Yarom2014] demonstrates that a Flush+Reload attack can be used to leak GnuPG private keys.
In a Prime+Probe attack\index{Prime+Probe}, there is no need for memory to be shared between victim and spy. The attack works in 3 steps:
- The Prime step: The spy fills one or more cache sets with its data, for example, by accessing data that maps to those cache sets.
- The spy waits for some time to let the victim potentially access data that maps to those same cache sets.
- The Probe step: The spy accesses that same data as in the prime step. Measuring the time it takes to load the data, it can derive how many cache lines the victim evicted from each cache set in step 2, and from that derive information about addresses the victim accessed.
[@Osvik2005] first documented this technique in 2005 and demonstrates extracting AES keys in just a few milliseconds.
An attentive reader may have noticed that the attacks described above follow a similar 3-step pattern. [@Weber2021] describes this general pattern and uses it to automatically discover more side-channels that follow this 3-step pattern. They describe the general pattern as being:
- An instruction sequence that resets the inner CPU state (reset sequence).\index{reset sequence}
- An instruction sequence that triggers a state change (trigger sequence).\index{trigger sequence}
- An instruction sequence that leaks the inner state (measurement sequence).\index{measurement sequence}
Other cache-based side channel attacks following this general 3-step approach include: Flush+Flush\index{Flush+Flush}[@Gruss2016a], Flush+Prefetch\index{Flush+Prefetch}[@Gruss2016], Evict+Reload\index{Evict+Reload}[@Percival2005], Evict+Time\index{Evict+Time}[@Osvik2005], Reload+Refresh\index{Reload+Refresh}[@Briongos2020], Collide+Probe\index{Collide+Probe}[@Lipp2020], etc.
As described in [@Su2021], 3 conditions need to be met for a cache-based side-channel attack to succeed:
- There is a mapping between a state change in the cache and sensitive information in the victim program.
- The spy runs on a CPU that shares a cache level with the CPU the victim runs on.
- The spy can infer a cache status change caused by the victim through its own cache status.
Mitigations against cache side-channel attacks can be categorized according to which of the 3 conditions above they aim to prevent from happening:
A typical example of when a cache state change could be correlated to sensitive information is when a program uses secret information to index into an array. An attacker could derive bits of the secret information by observing which cache line was fetched.
Especially in crypto kernels, indexing into an array using a secret value is generally avoided. An alternative mitigation is to always access all array indices, independent of the secret value, e.g. as done in commit 46fbe375 to the PuTTY project, which contains this comment:
* Side-channel considerations: the exponent is secret, so * actually doing a single table lookup by using a chunk of * exponent bits as an array index would be an obvious leak of * secret information into the cache. So instead, in each * iteration, we read _all_ the table entries, and do a sequence * of mp_select operations to leave just the one we wanted in the * variable
If the victim and the spy do not share a common channel -- in this case a cache level -- then a side channel cannot be created.
One way to achieve this is to only allow one program to run at the same time, and when a context switch does happen, to clear all cache content. Obviously, this has a huge performance impact, especially in systems with multiple cores and with large caches. Therefore, a wide variety of mitigations have been proposed that aim to make attacks somewhat harder without losing too much system efficiency. [@Mushtaq2020] and [@Su2021] summarize dozens of proposals and implementations -- too many to try to describe them all here.
One popular such mitigation is disabling cpu multithreading\index{multithreading}. For example, Azure suggests that users who run untrusted code should consider disabling cpu multithreading. The linux kernel's core scheduling documentation also states mutually untrusted code should not run on the same core concurrently. It implements a scheduler that takes into account which processes are mutually-trusting and only allows those to run simultaneously on the same core.
One could argue that site isolation\index{site isolation} as implemented in many web browsers is a mitigation that also falls into this category. Site isolation is described in more detail in its own section.
Mitigations disabling the spy program to infer a cache status change in the victim program through its own cache status
In some contexts, the resolution of the smallest time increment measurable by the spy program can be reduced so much that it becomes much harder to distinguish between a cache hit and a cache miss. Injecting noise and jitter into the timer also makes it harder to distinguish between a cache hit and cache miss. This is one of the mitigations in javascript engines against Spectre attacks. For more information see this v8 blog post or this Firefox documentation of the performance.now() method\index{Spectre}.
Note that this is not a perfect mitigation - there are often surprising ways that an attacker can get a fine-grained enough timer or use statistical methods to be able to detect the difference between a cache hit or miss. One extreme example is the NetSpectre\index{NetSpectre} attack [@Schwarz2019] where the difference between cache hit and cache miss is measured over a network, by statistically analyzing delays on network packet responses. Furthermore, [@Schwarz2017] demonstrates how to construct high-resolution timers in various indirect ways in all browsers that have removed explicit fine-grained timers.
Another possibility is to clear the cache between times when the victim runs and the spy runs. This is probably going to incur quite a bit of performance overhead, and may also not always possible e.g. when victim and spy are running at the same time on 2 CPUs sharing a cache level.
Most CPUs implement one or more instruction pipelines. \index{pipeline}\index{instruction pipeline} In an instruction pipeline, the next instruction is started before the previous instruction has finished executing. When the previous instruction is a branch instruction, the next instruction that needs to be executed is only known when that branch instruction completes. However, waiting for the branch instruction to finish before starting the next instruction leads to a big performance loss.20 Therefore, most CPUs predict\index{predict} which instruction needs to be executed after a branch, before the branch instruction has completed. Correctly and quickly predicting the instruction after a branch instruction is so important for performance that most CPUs have multiple branch predictors\index{branch predictor}, such as:
- A predictor of the outcome of a conditional branch\index{conditional branch direction predictor}: taken or not taken\index{taken branch}. The prediction is typically history-based\index{history-based prediction}, i.e. based on the outcome of this and other branches in the recent past.
- A predictor of the target of a taken branch\index{branch target predictor}, i.e. the address of the next instruction after a taken branch.
- A predictor that is specialized to predict the next instruction after a function return instruction.\index{return address predictor}
A number of attacks have been described over the past few years. The following sections list a few examples, categorized per branch predictor component they target.
\index{conditional branch direction predictor}
Two examples are BranchScope [@Evtyushkin2018]\index{BranchScope} and BlueThunder [@Huo2019]\index{BlueThunder}. These attacks infer whether a branch is taken or not taken in a victim process. They do so by carefully making sure that a branch in the spy process uses the same branch predictor entry as the targeted branch in the victim process. By measuring whether the branch in the spy process gets predicted correctly, one can derive whether the branch in the victim process was taken or not.
This can be thought of as somewhat akin to the Prime+Probe cache-based side channel attacks.
When the outcome of a branch depends on a bit in a secret key, this can enable an attacker to derive the value of the secret key. These papers demonstrate deriving the secret key from implementations of specific cryptographic kernels. It can also be used to break ASLR\index{ASLR}.
\index{Branch target predictor}
Two examples are SBPA [@Aciicmez2007]\index{SBPA} and BranchShadow [@Lee2017]\index{BranchShadow}. These earlier attacks are based on making a branch in the spy process alias in the Branch Target Buffer (BTB\index{BTB}) with a targeted branch in the victim process. They use methods such as timing difference, last branch records\index{last branch record}, instruction traces\index{instruction trace} or performance counters\index{performance counter} to measure whether the branch in the spy process caused a specific state change in the BTB.
\index{return address predictor}
One example is Hyper-Channel [@Bulygin2008]\index{Hyper-Channel}. In this case,
a spy process invokes
The papers referred to above contain detailed explanations of how they set up the attack. All of these attacks use a general 3-step approach, similar to cache side channels:
- An instruction sequence that resets the branch predictor state (reset sequence), run by the spy process.\index{reset sequence}
- An instruction sequence that triggers a branch predictor state change (trigger sequence), run by the victim process.\index{trigger sequence}
- An instruction sequence that leaks the branch predictor state (measurement sequence), run by the spy process\index{measurement sequence}
::: TODO Describe the mitigations proposed against these side-channel attacks. [203]{.issue} :::
::: TODO Should we also discuss more "covert" channels here such as power analysis, etc? [176]{.issue} :::
CPUs execute sequences of instructions. There are often dependencies between instructions in the sequence. That means that the outcome of one instruction influences the execution of a later instruction.
Apart from the smallest micro-controllers, all CPUs execute multiple instructions in parallel. Sometimes even multiple hundreds of them at the same time, all in various stages of execution. Instructions start executing while potentially hundreds of previous instructions haven't produced their results yet. How can a CPU achieve this when the output of a previous instruction, which might not have fully executed yet, and hence whose output may not yet be ready, may affect the execution of that later instruction? In other words, there may be a dependency between an instruction that has not finished yet and a later instruction that the CPU also already started executing.
There are various kinds of dependencies. One kind is control dependencies\index{control dependencies}, where whether the later instruction should be executed at all is dependent on the outcome of the earlier instruction. Other kinds are true data dependencies\index{true data dependency}, anti-dependencies\index{anti dependency} and output dependencies\index{output dependency}. More details about these kinds of dependencies can be found on the wikipedia page about them.
CPUs overcome parallel execution limitations imposed by dependencies by making massive numbers of predictions\index{prediction}. For example, most CPUs predict whether conditional branches are taken or not, which is making a prediction on control dependencies. Another example is a CPU making a prediction on whether a load accesses the same memory address as a preceding store. If they do not access the same memory locations, the load can run in parallel with the store, as there is no data dependency between them. If they do access overlapping memory locations, there is a dependency and the store should complete before the load can start executing.
Starting to execute later instructions before all of their dependencies have been resolved, based on the predictions, is called speculation\index{speculation}.
Let's illustrate that with an example. The following C code
long abs(long a) {
if (a >= 0)
return a;
else
return -a;
}
can be translated to the following AArch64 assembly code:
cmp x0, #0
b.ge Lbb2
Lbb1:
neg x0, x0
Lbb2:
ret
The b.ge
instruction is a conditional branch instruction. It computes whether
the next instruction should be the one immediately after, or the one pointed to
by label Lbb2
. In case it's the instruction immediately after, the branch is
said to not be taken. Instead, if it's the instruction pointed to be label
Lbb2
, the branch is said to be taken. When the condition .ge
(greater or
equal) is true, the branch is taken. That condition is defined or set by the
previous instruction, the cmp x0, #0
instruction, which compares the value in
register x0
with 0. Therefore, there is a dependency between the cmp
instruction and the b.ge
instruction. To overcome this dependency, and be able
to execute the cmp
, b.ge
and potentially more instructions in parallel, the
CPU predicts the outcome of the branch instruction. In other words, it predicts
whether the branch is taken or not. The CPU will pick up either the neg
or the
ret
instruction to start executing next. This is called speculation, as the
CPU speculatively executes either instruction neg
, or ret
.
::: TODO Show a second example of cpu speculation that is not based on branch prediction. [177]{.issue} :::
Of course, as with all predictions, the CPU gets the prediction wrong from time
to time. In that case, all changes to the system state that affect the correct
execution of the program need to be undone. In the above example, if the branch
should have been taken, but the CPU predicted it to not be taken, the neg
instruction is executed incorrectly and changes the value in register x0
.
After discovering the branch was mis-predicted, the CPU would have to restore
the correct, non-negated, value in register x0
.
Any instructions that are executed under so-called mis-speculation\index{mis-speculation}, are called transient instructions\index{transient instructions}.^[Transient instructions caused by incorrect branch-direction prediction have also been called wrong-path instructions\index{wrong-patch instructions} @Mutlu2004]
The paragraph above says "the system state that affects the correct execution of the program, needs to be undone". There is a lot of system state that does not affect the correct execution of a program. And the changes to such system state by transient instructions is often not undone.
For example, a transient load instruction can fetch a value into the cache that was not there before. By bringing that value in the cache, it could have evicted another value from the cache. Whether a value is present in the cache does not influence the correct execution of a program; it merely influences its execution speed. Therefore, the effect of transient execution on the content of the cache is typically not undone when detecting mis-speculation.
Sometimes, it is said that the architectural effects\index{architectural effects} of transient instructions need to be undone, but the micro-architectural effects\index{micro-architectural effects} do not need to be undone.
The above explanation describes architectural effects as changes in system state that need to be undone after detecting mis-speculation. In reality, most systems will implement techniques that keep all state changes in micro-architectural buffers until it is clear that all predictions made to execute that instruction were correct. At that point the micro-architectural state is committed to become architectural state. In that way, mis-predictions naturally do not affect architectural state. [Could we find a good reference that explains micro-architectural versus architectural state in more detail? Is "Computer Architecture: A Quantitative Approach" the best reference available?]{.todo}
Faulting instructions\index{faulting instructions} are instructions that generate an exception at run-time. Many instructions can generate an exception, and are hence potentially faulting. For example most load and store instructions generate an exception when the accessed address is not mapped. Since so many instructions can generate an exception, processors typically speculate that they do not generate an exception to enable more parallel execution.
When an instruction faults, the execution typically continues at another location. Any instructions later in the instruction stream which are speculatively executed before the fault is detected are also called transient instructions\index{transient instructions}.
There is a kind of control dependency between every potentially-faulting instruction and the next one, as the next instruction to be executed depends on whether the instruction generates an exception or not. We call out this dependency separately here as the transient execution attacks we'll describe next get classified based on whether they make use of transient instructions after a misprediction, or transient instructions after a faulting instruction.
Transient execution attacks\index{transient execution attacks} are a category of side-channel attacks that use the micro-architectural side-effects of transient execution as a side channel.
The publication of the Spectre\index{Spectre} [@Kocher2019] and Meltdown\index{Meltdown} [@Lipp2018] attacks in 2018 started a period in which a large number of transient attacks were discovered and published. Most of them were given specific names, such as ZombieLoad, NetSpectre, LVI, Straight-line Speculation, etc. New variants continue to be published regularly.
Covering each one of them in detail here would make the book overly lengthy, and may not necessarily help much with gaining a better insight in the common characteristics of transient attacks. Therefore, we'll try to put them into a few categories and describe the characteristics of each category.
::: TODO Decide whether it's useful to talk about alternative categorizations of transient execution attacks, and if so, do add content. Consider pointing to https://github.com/MattPD/cpplinks/blob/master/comparch.micro.channels.md :::
The categorization below is based on one proposed in [@Bulck2020]. There are alternative categorizations. [@Bulck2020] defines 4 big classes of transient side-channel attack categories, based on whether:
- The transient execution happens because of a misprediction, or a faulting instruction.
- The attacker actively steers data or control flow of the transient execution or not.
This gives the following 4 categories:
+----------------+-------------------------+------------------------+ | | Steering of transient execution by attacker? | | +-------------------------+------------------------+ | | No (Leakage) | Yes (Injection) | +================+=========================+========================+ | Misprediction | Branch-predictor based | Spectre-style attacks | | | side-channels | | +----------------+-------------------------+------------------------+ | Faulting | Meltdown-style attacks | LVI-style attacks | +----------------+-------------------------+------------------------+
We discussed this category already in the section on Side-channels through branch predictors.
::: TODO Add a description of Spectre-style attacks such as Spectre-PHT, Spectre-BTB, Spectre-RSB, Spectre-STL, SpectreV1, SpectreV2, SpectreV3, SpectreV4, NetSpectre. [178]{.issue} :::
::: TODO Add a description of Meltdown-style attacks such as Meltdown, Foreshadow, LazyFP, Fallout, ZombieLoad, RIDL. [178]{.issue} :::
::: TODO Add a description of LVI-style attacks. [178]{.issue} :::
::: TODO Write section on site isolation as a SpectreV1 mitigation [179]{.issue} :::
A software supply chain attack occurs when an attacker interferes with the software development or distribution processes with the intention to impact users of that software.
Supply chain attacks and their possible mitigations are not specific to compilers. However, compilers are an attractive target for attack because they are widely deployed to developers, in continuous integration systems and as JITs. Also, an infected compiler has the possibility to make a much larger impact if it can silently spread the infection to other software created with or run using it.
This chapter explores the history of supply chain attacks that involve compilers and what can be done to prevent them.
As far back as 1974 Karger & Schell theorized about an attack on the Multics operating system via the PL/I compiler [@Karger1974]. In this attack, a trap door is inserted into the compiler, which then injects malicious code into generated object code. Furthermore, the trap door could be designed to reinsert itself into the compiler binary so that future compilers are silently infected without needing changes to their source code. This attack method was subsequently popularized by Ken Thompson in his 1984 ACM Turing Award acceptance speech Reflections on Trusting Trust [@Thompson1984].
If these cases seem far-fetched then consider that there have been several real examples of supply chain attacks on development tools.
Induc is a family of viruses that infects a pre-compiled library in the Delphi toolchain with malicious code [@Gostev2009]. When Delphi compiles a project the malicious library is included into the resulting executable, thus enabling the virus to spread. The virus was first detected in 2009 and was circulating undetected for at least a year beforehand. Several popular applications are known to have been infected, including a chat client and a media player. Overall, in excess of a hundred thousand infected computers were detected world-wide by anti-virus solutions.
XcodeGhost is the name given to malware first detected in 2015 that infected thousands of iOS applications [@Cox2015]. The source of the infection was tracked down to a trojanized version of Xcode tools. The malware exists in an extra object file within the Xcode tools and is silently linked into each application as it is built. File sharing sites were used to spread the trojanized Xcode tools to unwitting developers.
A trojanized linker was found to be involved in a supply chain attack discovered in 2017 named ShadowPad [@Greenberg2019]. Some instances of the attack were perpetrated using a trojanized Visual Studio linker that silently incorporates a malicious library into applications as they are built. Related attacks named CCleaner and ShadowHammer used the same approach of a trojanized linker to infect built applications. Infected applications from these attacks were distributed to millions of users world-wide.
These cases highlight that attacks on compilers, and especially linkers and libraries, are a viable route to silently infect many other applications, and there is no doubt that there will be more such attacks in the future. Let us now explore what we can do about these.
::: TODO Explain how these vulnerabilities arise and how to mitigate them. [180]{.issue} :::
Security vulnerabilities introduced by compilers have a long history. Thompson [@Thompson1984] provides one of the oldest and most popular examples in this area. In his paper, he talks about a compiler that can detect when it is compiling the login program and can insert a backdoor so that he can use the system as any user. However most common cases are where involuntary security vulnerabilities are added in the generated binary by the compiler.
::: TODO Explain how code that results in undefined behaviour can often work as the programmer expected until some optimisation is applied, and perhaps even talk a bit about why compilers rely on the absence of undefined behaviour in ways that appear aggressive in some occasions. [202]{.issue} :::
When discussing compiler introduced security vulnerabilities, undefined behavior plays a major role. Its implications were thoroughly discussed by various works such as [@wang2012undefined] [@d2015correctness] [@dusilent]. By reading the works of these authors, one can see that even projects that went through careful testing, such as Linux, FreeBSD or PostgreSQL, could not escape from this class of vulnerabilities. To better understand them, this chapter contains several examples of such vulnerabilities, their implications and how they got fixed.
The first example is a 15 years old vulnerability that affected the random number generator (RNG) in Mac OS X [@Wang2015]. At some point in the past, this vulnerability affected all *BSD operating systems, as they have a common ancestor with Mac OS.
In the random number generator of the system, more specifically in
srandomdev(3)
, we can spot the following piece of code used in the seeding
logic:
struct timeval tv;
unsigned long junk;
gettimeofday(&tv, NULL);
srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);
For generating a seed for the RNG, the code uses the current time and an
uninitialized value from the stack, i.e. junk
. This triggers undefined
behavior as the C standard has no clear semantics for uninitialized loads.
Because of that, there was a huge difference in the generated assembly code for
two different Mac OS X releases.
In Mac OS X 10.6 the generated code looked like this:
leaq 0xe0(%rbp),%rdi
xorl %esi,%esi
callq 0x001422ca ; symbol stub for: _gettimeofday
callq 0x00142270 ; symbol stub for: _getpid
movq 0xe0(%rbp),%rdx
movl 0xe8(%rbp),%edi
xorl %edx,%edi
shll $0x10,%eax
xorl %eax,%edi
xorl %ebx,%edi
callq 0x00142d68 ; symbol stub for: _srandom
While for Mac OS X 10.7 the code looked like this:
leaq 0xd8(%rbp),%rdi
xorl %esi,%esi
callq 0x000a427e ; symbol stub for: _gettimeofday
callq 0x000a3882 ; symbol stub for: _getpid
callq 0x000a4752 ; symbol stub for: _srandom
In the shorter version of the generated assembly code, the compiler dropped the
whole argument of srandom
as an optimization. While the optimised code
respects the standard, it leaves room for an attacker to exploit the system
because the seed of the RNG can now be predicted.
In the meantime, this problem has been resolved in FreeBSD [@FbsdJunk] and OpenBSD [@ObsdJunk].
Current solutions for detecting this class of vulnerabilities include LLVM's MemorySanitizer and Valgrind.
The next example covers a new type of undefined behavior that can easily
introduce security vulnerabilities. This time we talk about dereferencing NULL
pointers and what might go wrong with this operation. The following piece of
code is taken from Linux and introduces a vulnerability by dereferencing the
tun
pointer before it checks that the pointer is valid:
unsigned int
tun_chr_poll(struct file *file, poll_table * wait)
{
struct tun_file *tfile = file->private_data;
struct tun_struct *tun = __tun_get(tfile);
struct sock *sk = tun->sk;
if (!tun)
return POLLERR;
...
}
Normally, this would cause a crash in the kernel or the function would return
POLLERR if address 0 was mapped in the address space. However the compiler
assumes that tun
is a valid pointer when the execution reaches the if
statement. This happens because it saw an earlier dereference just before the
if statement. In this situation, the check is considered redundant and deleted
from the final binary. This allows an attacker to continue executing code from
tun_chr_poll
when address 0 is mapped.
To mitigate against this situation, GCC developers added a flag called
-fno-delete-null-pointer-checks
that Linux integrated in its compiler
configuration.
Linux was not the only project that suffered from this problem. Chromium [@ChromiumIssue] and Mozilla [@MozillaIssue] had problems in the past with this.
There are also cases of security vulnerabilities that are not introduced by
undefined behavior, the following piece of code is such an example. This was
taken from the Linux kernel. Because the compiler sees that the pointer hash is
never used after this point, it decides to delete the memset operation. We call
this dead store optimization (DSO) \index{dead store optimization}. This has
serious security implications because the intention of the programmer was to
delete the hash
information from memory.
static void extract_buf(struct entropy_store *r, __u8 *out) {
...
- memset(&hash, 0, sizeof(hash));
+ memzero_explicit(&hash, sizeof(hash));
}
The solution Linux came with was to add a new function called memzero_explicit
which under the hood looks like this:
void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
OPTIMIZER_HIDE_VAR(s);
}
It still uses memset
\index{memset} to delete the associated security
sensitive data, but it also tries to eliminate the risk of DSO by using the
OPTIMIZER_HIDE_VAR macro. This, however, is not enough to fully eliminate
dead stores [@MemZeroBarrier]. In case of using LTO, the buffer s
is still
vulnerable. For this reason, Linux maintainers added a further hardening
mechanism by using a compiler barrier instead:
void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
- OPTIMIZER_HIDE_VAR(s);
+ barrier();
}
There is still room for improvement regarding the introduced barrier
[@MemZeroDataBarrier]. If the content of the buffer is present in registers,
then the compiler blindly proves again that the DSO can be triggered and the
memset
will be again deleted. To mitigate against this, the following patch
was proposed:
+ #define barrier_data(ptr) \
+ __asm__ __volatile__("": :"r"(ptr) :"memory")
void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
- barrier();
+ barrier_data(s);
}
In this patch we create a new barrier that will be guaranted to put the content of the buffer in memory so that DSO can take no further effect.
Similar efforts were conducted in other projects such as OpenSSL
[@OpenSSLMemClr]. The approach OpenSSL used is rather different but it achieves
the same end goal, i.e. eliminating the effect of DSO. By making memset_func
a
volatile pointer to the actual implementation of memset
, the compiler is
forced to dereference the pointer to get to the actual memset
, thus
eliminating the risk of optimizing it out.
The C23 committee decided to tackle this problem from another angle, i.e. by
adding a library function called memset_explicit
[@MemSetProposal]. This
function requires the compiler to not optimize the memory overwrite away.
However it is not trivial to implement such a functionality, as GNU presents in
[@GNUMemSet]. The information may be present somewhere in the machine, even if
it was erased from memory.
Underhanded code is code that looks like it is doing one thing but actually does something else. It usually refers to code nefariously written in this way to sneak malicious behavior not obviously visible on code inspection. However the same applies also when the underhanded behavior has been added by mistake - in fact the ideal attack would also give the attacker the possibility to say it was an honest mistake.
[@Wheeler2020] provides a more systematic view of underhanded code than is given here. This chapter presents some examples to give an overview of the attack and how a compiler can (and can't) help, although this probably doesn't classify as "low level" software security.
Probably the most classic example of underhanded code in C-style
languages is code that takes advantage of the fact that the assignment
(=
) and the equality operator (==
) look very similar.
void somefunction(UserPermissionLevel permission_level) {
if (permission_level = LEVEL_ADMIN) {
do_admin_action();
return;
}
report_access_violation();
}
Here permission_level
is not compared to LEVEL_ADMIN
, but rather
assigned that value. Meaning that if LEVEL_ADMIN
is a non-zero value
the if condition will evaluate to true.
Both Clang and GCC detects this example with warning option
-Wparentheses
. Adding a pair of parentheses around the assignment
silences that warning. Those extra parentheses would probably draw a
reviewer's attention in this specific example but had the condition
been more complex with multiple subexpressions combined with ||
and
&&
the parentheses had looked normal.
This could be written off as a language design problem. Traditionally
Python has not supported using assignment as expressions, effectively
sidestepping this particular problem. When assignment expressions were
added to the language [@pep572] (in python 3.8) the more visually
distinct "walrus operator" :=
was used to avoid this risk of
confusing with the comparison operator. Even in cases where it is an
existing language the compiler can provide options to forbid risky
constructs for those who want and can opt in to it, essentially
creating a new language that is a subset of the original language.
The "goto-fail" bug, officially known as CVE-2014-1266, caused Apple devices to not correctly validate certificates in TLS connections. It effectively disables the verification the function is supposed to do, and is easy to miss in code review. This was most likely a mistake that was not added intentionally, but still has the characteristics of underhanded code.
The bug was caused by a duplicated "goto fail" line, which was
indented making it look like it was guarded by previous if clause,
when it in fact always was executed. Since the err
variable was set
to 0 (i.e no error) inside the if, the function would always return 0.
OSStatus
SSLVerifySignedServerKeyExchange(...)
{
OSStatus err;
...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
...
fail:
return err;
}
At the time of the bug they didn't but nowadays both GCC and Clang
have a -Wmisleading-indentation
option that detects this kind
problem.
Also automatic code formatting tools such as clang-format would help finding this issue as it will fix the misleading indentation.
"Trojan Source" attacks described by [@Boucher2023] is another way underhanded code could be achieved by doing something that makes the editor (or other thing that displays code to the user) render the code in a different way than the compiler parses it.
This is done by using unicode features. Support for bidirectional text is one such feature, where it has special characters to mark regions of text to be right-to-left or left-to-right to allow mixing e.g Arabic and English in the same document. If these are used in cunning ways they can make the text render in a way that makes a word look like it is inside a comment but to the compiler, which parses it in the order it appears in the file, sees it is after the comment.
GCC provides
-Wbidi-chars
to detect usage of writing direction markers. Clang compiler
doesn't
provide such warning, but has a check in
clang-tidy
Another variant of "trojan source" is usage of homoglyphs, different characters that looks similar or even identical (depending on font). As an example, unicode contains the character "Division Slash" (U+2215) which is different from the ordinary ascii slash "Solidus" (U+002F).
/* Important comment */
check_permissions();
/* Another comment */
If the final slash on the first line isn't a slash but another similarly looking character that the compiler doesn't understand as end of comment it means that the whole snippet is commented out.
::: TODO This chapter should probably be moved under section 'Physical access side-channel attacks' higher-up [181]{.issue} :::
There are many types of physical attacks – these attack methods focus on one or multiple physical properties of systems ( e.g. CPU, GPU, crypto hardware), and can either be
- Passive – just monitoring physical quantities (e.g. side channel information leakage), or
- Active – modification of physical quantities, for example, by
- changing the operation conditions of the system so that the circuit operates outside its specifications (e.g. by changing temperature, or by applying glitches to supply voltage/clock source)
- injecting faults to the system (e.g. altering the electrical state of the system using Electromagnetic pulse injection, or laser beam)
- physically modifying the system/chip
In the rest of this section, we will focus on a subset of physical attacks:
- Side channel information leakage
- Physical attack using glitches
These two forms of attacks can be carried out using low cost hardware, and have been widely demonstrated by researchers on SoCs or microcontrollers developed for IoT (Internet-of-Things) applications.
If an attacker has physical access to a device, even without debug access, the attacker can collect side channel information about the program execution on a processor. If the processor is used to handle cryptographic operations, the side channel information can be used to deduce the crypto key(s) or the data being processed. Please note that some forms of physical attacks (e.g. fault injection attacks like rowhammer and voltjockey) do not require physical access, but those attacks are not covered in this section.
The most common physical access side channel attack method is to capture the voltage or current consumption of the device during its operation. Every time a flip-flop toggles, the switching activity results in a small current spike. Even though there is capacitance on the power distribution connections inside the chip, the toggling of registers (composed of flip-flops) still results in variations in the power supply current, which can be observed easily. Because the connections for delivering power (at the power supply, printed circuit board, on chip packages as well as on the silicon dies) also contain resistance, the variation of electrical current in the chip’s power supply also results in variations in the power supply voltage. Again, this can be observed easily if the attacker has physical access to the device. If an attacker has access to data acquisition equipment that can record the current/voltage/power patterns, he/she can record the “power signature” for different crypto operations, including the power signature using different data inputs. By applying analysis techniques like differential power analysis, the attacker can extract the information being processed. One additional form of side-channel leakage is electro-magnetic radiation. Because the processor’s clock frequency is usually in the radio frequency (RF) range, the wires on the die and the tracks on the PCB become small antennas, and the ripples in the processor’s voltage/current results in radio frequency signals. Although the RF power radiated can be tiny, it still means that an attacker can observe the side-channel leakage if he/she is in close proximity from the device and has the right equipment to amplify and record the RF power signature. However, the risk of such attack can be reduced by reducing the radiation energy level using:
- Shielding around the device, including ground plate on the circuit board.
- Coupling capacitors on power supply tracks on the printed circuit board.
Generally, such an attack requires knowledge of radio circuit techniques and the result can be affected by other factors. For example, in normal environments there are many other source of RF noises that affects the accuracy of signal measurement. In a “noisy” environment, the RF signals from various wireless communication gadgets nearby might drown out the signals from the device being monitored.
Instruction executions can result in various forms of side channel leakage:
Cycle timing resulting from conditional branch – A code sequence containing a conditional branch could result in observable side channel leakage. For example, if the power signatures of several code segments are easily recognizable (process A, B and C in the following diagram), it is possible to detect if the conditional branch was taken or not.
Cycle timing resulting from specific data values – The execution cycles of some instructions can be dependent on the values of input data, resulting in timing side-channel leakage. E.g., the integer divide instruction in Arm Cortex-M processors.
Power variation due to value changes – The power spikes in the power signature are often dependent on a combination of how many bits are set and how many bits have toggled in the register(s) --- the so-called Hamming weight and Hamming distance, so the amplitude of the spike could be used to guess the register value in that clock cycle. The power spikes can be caused by a combination of
- Logic switching due to the operations of an instruction (e.g. power consumed by a single cycle multiplier can be much higher than the power used by a Boolean logic function), and
- Logic switching due to changes in data values in the register bank and data paths.
The switching activities are dependent on preceding and next operations. If the power signature of the codes around a specific instruction is recognizable, then the data value being process could be guessed.
In some SoC or microcontroller implementations, the power spike effect of the operations can be much higher than the effect of data value changes in the register banks. In such case the program execution flow can be observed, and as a result, might also indirectly leak information about the data that it is processing.
For normal embedded devices that don’t have physical protection features, there is a much higher chance that power/voltage/radiation side channels can result in information leakage. However, some aspects of timing signature leakage could be reduced:
- Using data processing instruction with data independent timing for cryptographic operations. In recent Arm architectures (including Armv8-A and Armv8-M), some instructions are architecturally defined as DIT (Data Independent Timing).
- For conditional branches where the condition is dependent on secret data, use table branch instead might help reduce timing base leakage (both paths result in a branch). It is not necessary to replace all conditional branch. For example, many loop counters in crypto operation can be independent to the crypto key or input data values, so there is no need to change those loops.
::: TODO There is overlap with section timing-side-channels. How to best consolidate that? [182]{.issue} :::
There are additional software techniques to mitigate power leakage. One of the most well-known techniques is masking (e.g. Boolean, multiplicative, affine). When applying software mitigation, software developers need to check that optimizations carried out by compilers (C/C++) do not impact the mitigation, as compilers can be very smart and undo the masking in order to perform faster operations (or reducing code size).
If an attacker has physical access to a device, they can also choose to use physical attacks to modify the behavior of the software, for example, prevent the software from setting up certain security features during the device’s initialization sequence. The two most common forms of such attacks are voltage glitching and clock glitching.
- Voltage glitch attack
- Using a programmable power supply that can switch the voltage level rapidly, it is possible to reduce/increase the power supply voltage of a chip at specific clock cycle of the software execution. In some case, a precise voltage drop can cause a processor to “skip” an instruction, for example, the write to memory or a hardware register might not be taken. Or if a write has taken place, the actual write value used could be changed by the voltage glitch.
- Clock glitch attack
- Using a clock switching circuit, it is possible to reduce the width of a clock pulse, or the interval between two clock pulses so that some of the hardware registers are not updated correctly at certain clock edge(s). Similar to voltage glitch, this can make the hardware seems to be skipping an instruction.
Such voltage/clock glitch attack could affect multiple parts in the processors, but sometimes the impact might not lead to any visible error in the operation, leaving the only effect that the processor skipping a memory/register write, or writing an incorrect value. Potentially, a glitch attack could result in other observable effect (e.g. register reset, bit toggle). The analysis of fault injection methods (and their physical effect) and the observable effects at the program or instruction execution level are often referred to as fault models, where one can say that a specific fault injection behaves as an instruction skip, etc. More details about the concept of fault models can be found in the paper "Fault Attacks on Secure Embedded Software: Threats, Design and Evaluation”, where a good illustration of the concept is shown in figure 1 of that paper.
::: TODO Make the above reference to a paper use bibtex. [159]{.issue} :::
Using glitching methods, there are several common ways of attacking a system. For example:
- Skipping an instruction during setup sequence for security features – e.g. skipping the write to the MPU (Memory Protection Unit)/Security Attribution Unit (SAU) so that the MPU/SAU is not enabled.
- Skipping an instruction after a security authentication that branch to an error handling code. As the branch is not taken, the code can continue to operate even a security authentication has failed.
- Causing an incorrect value to be written in a memory or hardware. E.g. When writing a crypto key to a crypto accelerator, forcing the key value written to be zero (caused to low voltage on bus hardware).
Example: Attack on TrustZone for Armv8-M
There are other forms of physical attacks, but most of them requires significant effort or cost (e.g. cut open the chip package can carry out fault injection or readout secret data on chip).
Ideally, system designers can use hardware (SoCs or microcontroller) that support protection against fault injection. For example, a hardware circuit can include redundancy logic (spatial and temporal). In addition, software developers can make such attack harder by adding checks after the write operations. When applying software mitigation, software developers need to check that optimizations carried out by compilers (C/C++) do not impact the mitigation.
::: TODO Write chapter with other security topics. :::
If you'd like to start contributing to this book: please do, we're looking forward to your contributions!
The project lives on github at https://github.com/llsoftsec/llsoftsecbook. We also have a Discord server where you can have an interactive chat with us at https://discord.gg/Bm55Z9Ppgn.
We use github issues as our issue tracker and use github pull requests to accept edits, changes, additions and more.
If you'd like to contribute, but are not sure where to start, the list of open issues labeled with "good first issue" may give you inspiration of things to contribute. Please, also don't be shy to reach out to us on Discord.
We follow the Contributor Covenant Code of Conduct in this project.
For more details on how to write text for the book, please read contributing.md. If after reading that, you think some specific aspects could be explained better, please do let us know by raising an issue.
\printindex
::: {#refs} :::
Footnotes
-
This is an oversimplified example for illustrative purposes. However, as this is a wide class of vulnerabilities, many real-world examples can be found and studied. ↩
-
The code is generated with the
-fno-stack-protector
option, to ensure GCC's stack guard feature is disabled. We also used the-O1
optimization level. ↩ -
A shellcode is a short instruction sequence that performs an action such as starting a shell on the victim machine. ↩
-
Note that the use of nested functions in GCC requires trampolines which reside on an executable stack. The use of nested functions, therefore, poses a security risk. ↩
-
The LTO and visibility flags are required by Clang's CFI. ↩
-
The
-ffixed-x18
flag results in treating thex18
register as reserved, and is required by-fsanitize=shadow-call-stack
on some platforms. ↩ -
With the FPAC extension, a fault is raised at incorrect authentication. ↩
-
If the Top-Byte-Ignore (TBI)\index{Top-Byte-Ignore (TBI)} feature is enabled, the top byte of pointers is ignored when performing memory accesses. This restricts the number of available PAC bits. ↩
-
This is obviously not a realistic example of how authentication should be done, but simply serves to illustrate how a buffer overflow into a non-control variable can have serious security consequences. ↩
-
The stack frame layout may be significantly different for other architectures and compilers. ↩
-
The authors describe how DOP gadgets can be chained to simulate a Turing machine, making DOP attacks Turing-complete (it's not possible to simulate the infinite tape of a Turing machine on any actual hardware, of course). Turing-completeness is not, however, a particularly useful measure of exploitability, as explained in [@Dullien2018]. Many applications offer their users the ability to perform arbitrary computation, for example JavaScript engines, and those capabilities can be useful to an attacker, but performing a computation without affecting normal program behavior does not constitute "exploitation". ↩
-
Fuzzing \index{fuzzing} is a powerful testing technique that relies on automatically generating large amounts of random inputs to the program under test. ↩
-
ASan also includes a memory leak detector \index{LeakSanitizer}. ↩
-
These primitives are restricted since they can only access a limited number of bytes past the end of the allocation. ↩
-
Some C++ containers have accessors that do perform bounds checking, for example
std::array::at()
andstd::vector::at()
. ↩ -
This refers to the interface between different binary modules, typically a user program and a system library. The ABI describes low-level details of that interface, for example the assignment of arguments and return values into registers or memory. In many systems, the ABI is expected to change rarely, so programs and libraries can be updated independently and still work together. This makes ABI changes undesirable, which is why this proposal aims to minimise them. ↩
-
Rust also provides features that provide temporal memory safety and thread safety. ↩
-
JIT compilers compile code during execution of a program, as opposed to the more traditional compilation where code is compiled before the program is executed. ↩
-
This naturally leads to attempts to bypass the hardening too [@Fetiveau2019]. ↩
-
Over time, new CPU designs tend to support having more instructions in flight. [@Eyerman2009, section 4.2.3] suggests that branch prediction accuracy has to grow more than linearly when the number of pipelines, or the depth of the pipeline grows. Therefore, there is a constant push to increase the accuracy of branch predictors. ↩