NOTE: At some point Apple being the party poopers that they are, slapped entitlement restrictions on the
thread_set_state(...)
API making it no longer usable in normal macOS machines (short of adding Apple entitlements and telling AMFI to get our of the way). This blocks my writeup's technique to create a breakpoint, but you can still get around this via the[mach_]vm_protect()
APIs and making a breakpoint. This technique would require the process to be debugged, or not codesigned, or having something along the lines ofcom.apple.security.cs.disable-executable-page-protection
(macOS) to create a breakpoint and modify executable code (like what lldb does). Maybe I'll update this one day...
Let's play a game: A series of code snippets and how they are compiled will be presented. In each code snippet, a challenge is given to execute a certain function that should be inaccessible unless you know the password. In order to execute this privileged function, you're not allowed to alter the source code nor how it's compiled in any way. Fortunately, you can assume that you have code execution in a dynamic library running in the same address space and loaded in via the DYLD_INSERT_LIBRARIES
environment variable.
For these challenges, all executables are compiled and run on an Apple macOS Monterey operating system with hardware capable of running ARM64/ARM64e. Since Apple is transitioning away from Intel in their device lineup, only ARM64 & ARM64e will be covered. clang-1400.0.29.102 is used for all examples and was tested on a macOS 12.6 M1 machine.
This writeup assumes you have an understanding of the C language as well several Apple concepts. If you're unfamiliar with Mach-O load commands and the symbol table, you're encouraged to read about those first by googling or looking up the <mach-o/loader.h> & <mach-o/nlist.h> headers.
Note: This writeup is consciously skipping any Objective-C of Swift swizzling as that concept has been thoroughly documented across the internet. However, there are applicable tricks that are discussed (i.e.
__builtin_return_address
) that could be useful for your swizzling endeavors.
Sounds good? Game on?
Given the following C snippet which produces the ex1
executable, execute the do_the_thing()
function before the program completes.
// ex1.c
// xcrun -sdk macosx clang -arch arm64 ex1.c -o /tmp/ex1 -O0 -mmacosx-version-min=12.6
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
#include <stdio.h>
#include <CommonCrypto/CommonCrypto.h>
char g_secret[CC_MD5_DIGEST_LENGTH] = "\x5f\x4d\xcc\x3b\x5a\xa7\x65\xd6"
"\x1d\x83\x27\xde\xb8\x82\xcf\x99"; // "password"
int security_check(const char *password) {
char result[CC_MD5_DIGEST_LENGTH];
if (!password) {
return 0;
}
CC_MD5(password, strlen(password), (unsigned char*)result);
return memcmp(result, g_secret, CC_MD5_DIGEST_LENGTH) == 0;
}
void do_the_thing(void) {
printf("🌈success!🌈\n");
}
int main(int argc, const char* argv[]) {
if (security_check(argc > 1 ? argv[1] : NULL)) {
do_the_thing();
}
return 0;
}
This snippet of code checks for a passphrase that's passed in as an argument over the command line. If there is an argument, the passphrase is passed into security_check(const char*)
. If the MD5 hash matches a hardcoded hash (which is derived from the phrase "password"), then the do_the_thing()
function is invoked. Per the "rules" of the challenge, the compilation flags found at the beginning of the source code must be used to create the ex1
executable.
~ cp ex1.c /tmp/
~ cd /tmp/
~ xcrun -sdk macosx clang -arch arm64 ex1.c -o /tmp/ex1 -O0 -mmacosx-version-min=12.6
~ ex1 test
~ ex1 password
🌈success!🌈
Assessment There's a number of hooking points that could be used coherence this code into calling do_the_thing()
. Fortunately, it is straightforward to be able to call the do_the_thing()
symbol directly. There's no symbol stripping and do_the_thing()
is exported as a global symbol. This means other loaded executable frameworks or dylibs, also known as images, can reference it by name, using either an extern
declaration or using the dlopen
/dlsym
combo.
Here's a solution which calls the do_the_thing()
function directly before the program completes:
// solution1.c
// xcrun -sdk macosx clang -arch arm64 solution1.c -O0 -shared -o /tmp/solution1.dylib -Wl,-U,_do_the_thing -mmacosx-version-min=12.6 # 1
__attribute__((destructor)) void deinit(void) { // 2
extern void do_the_thing(void);
do_the_thing();
}
Comments are added to the code to highlight points of interest.
- When compiling, the
-Wl,-U,_do_the_thing
flag instructs the linker (via the-Wl,..
part) to ignore any undefined references to thedo_the_thing()
symbol. With Apple, any C function will have an underscore prepended to the symbol name. Typically, an executable calls into a dynamic library and will link to it. For this case, the dynamic library is calling into the executable without linking to it. - The function is marked up with the
__attribute__((destructor))
, which tellsdyld
to call this block of code before the image is unloaded. This means that the code will execute after themain
function completes.
Once compiled, you can see the solution in action via the DYLD_INSERT_LIBRARIES
environment variable applied to ex1
.
~ DYLD_INSERT_LIBRARIES=/tmp/solution1.dylib /tmp/ex1
🌈success!🌈
For those unfamiliar with DYLD_INSERT_LIBRARIES
, it will load code into a process before anything else gets loaded. This is contingent on process permissions and security settings (i.e. -Wl,-add_empty_section,__ RESTRICT,__restrict
). One would need to disable Apple's System Integrity Permission to be able to use DYLD_INSERT_LIBRARIES
on an Apple executable.
With the warmup completed and the rules established, let's move on to something a bit more exciting.
Given the following snippet which produces ex2
, execute the do_the_thing()
function before the program completes via any means necessary.
// ex2.c
// xcrun -sdk macosx clang -arch arm64 ex2.c -o /tmp/ex2 -O0 -mmacosx-version-min=12.6
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
#include <stdio.h>
#include <CommonCrypto/CommonCrypto.h>
char g_secret[CC_MD5_DIGEST_LENGTH] = "\x5f\x4d\xcc\x3b\x5a\xa7\x65\xd6"
"\x1d\x83\x27\xde\xb8\x82\xcf\x99";
int security_check(const char *password) {
char result[CC_MD5_DIGEST_LENGTH];
if (!password) {
return 0;
}
CC_MD5(password, strlen(password), (unsigned char*)result);
return memcmp(result, g_secret, CC_MD5_DIGEST_LENGTH) == 0;
}
static void do_the_thing(void) { // 1
printf("🌈success!🌈\n");
}
int main(int argc, const char* argv[]) {
if (security_check(argc > 1 ? argv[1] : NULL)) {
do_the_thing();
}
return 0;
}
Assessment It looks like the authors caught on to how trivial it is to directly call the do_the_thing()
function and have added a static
declaration to do_the_thing()
. This is the only change between ex1.c
and ex2.c
. This makes it so do_the_thing()
is not a globally exported symbol and can not be directly referenced by other images. If this code were to run with the previous solution, the following crash would occur:
~ xcrun -sdk macosx clang -arch arm64 ex2.c -o /tmp/ex2 -O0 -mmacosx-version-min=12.6
~ DYLD_INSERT_LIBRARIES=/tmp/solution1.dylib /tmp/ex2
dyld[14227]: symbol not found in flat namespace (_do_the_thing)
[1] 14227 abort DYLD_INSERT_LIBRARIES=/tmp/solution.dylib /tmp/ex2
NOTE: Even with the
static
declaration, it's still possible to directly call thedo_the_thing()
function. The function is still referenced by name in the symbol table and can be accessed through other means. However, in order to showcase different techniques, assume that there's no easy way to directly executedo_the_thing()
and alternative methods must be explored.
One such method is symbol interposing, which allows replacing a reference to a symbol with another. Symbol interposing can be used to alter parameters, return values, or even completely replace the symbol. This is typically done through undefined external references to symbols, which are implemented in other images than the one referencing them. Examining the undefined exported symbols can provide insight into potential avenues for interposing.
For example, upon examining the external symbols compiled into ex2
, several potential interposing solutions can be implemented to augment execution control and allow the do_the_thing()
function to execute. These include:
~ nm -mu /tmp/ex2
(undefined) external _CC_MD5 (from libSystem)
(undefined) external ___stack_chk_fail (from libSystem)
(undefined) external ___stack_chk_guard (from libSystem)
(undefined) external _memcmp (from libSystem)
(undefined) external _printf (from libSystem)
(undefined) external _strlen (from libSystem)
Upon looking at the external symbols compiled into ex2
, there are several potential interposing solutions which could augment execution control allowing the do_the_thing()
function to execute. Here's an idea for each relevant symbol if it were to be replaced:
- Replacing the
CC_MD5
symbol to match the "password" hash, so thememcmp
check succeeds. - Interposing
memcmp
to return 0, so the 2 values are believed to be equal, resulting in the conditional check succeeding and executingdo_the_thing()
. - Interposing
strlen
to directly call the address of thedo_the_thing()
symbol by walking the symbol table and determining the load address. This technique can actually be applied to any of the above symbols with some caveats that are discussed below.
Given these options, the memcmp
path is preferred for its simplicity. Here's a solution that interposes all external references to memcmp
making every comparison believed to be equal.
// solution2.c
// xcrun -sdk macosx clang -arch arm64 solution2.c -O0 -shared -o /tmp/solution2.dylib -mmacosx-version-min=12.6
#include <stdio.h>
#include <string.h> // memcmp
int my_memcmp(const void *s1, const void *s2, size_t n) { // 1
printf("interposed! returning match\n");
return 0;
}
__attribute__((used, section("__DATA,__interpose"))) // 2
static void* interpose[] = {my_memcmp, memcmp};
my_memcmp
is declared which will stand in for the realmemcmp
and always return a matching comparison.- The
interpose
array contains references to themy_memcmp
andmemcmp
and has 2 compiler attributes. The first one,used
, tells the compiler not to optimize out this declaration since it's not referenced elsewhere. This is a bit superfluous as this is compiled with no optimizations (-O0
). The second attribute,section("__DATA,__interpose")
, will put the contents of theinterpose
array into the specified Mach-O section. Mach-O load commands are outside the scope of this tutorial but you can find many tutorials around the internet. Upon loading an image,dyld
will inspect the Mach-O load commands. If dyld sees a Mach-O section called__interpose
in the__DATA
segment, dyld will attempt to interpose the declared symbols on any future images that are loaded into the process. More than one pair of symbols can be provided. Be aware thatdyld
consults AMFI flags which can prevent interposing on certain processes.
Although not included in the above code, dyld
offers a convenient C define macro which can be found here which does the same thing as the above declared attribute values in a more friendly API.
With the solution2.dylib
compiled, you can see memcmp
being interposed in ex2
provided ex2
gets an argument.
~ xcrun -sdk macosx clang -arch arm64 solution2.c -O0 -shared -o /tmp/solution2.dylib -mmacosx-version-min=12.6
~ DYLD_INSERT_LIBRARIES=/tmp/solution2.dylib /tmp/ex2 muwahahahaa
interposed! returning match
🌈success!🌈
You can see exactly what's happening during symbol interposing by adding the undocumented DYLD_PRINT_INTERPOSING
environment variable. Adding DYLD_PRINT_INTERPOSING
to the previous command produces the following output on this machine:
DYLD_PRINT_INTERPOSING=1 DYLD_INSERT_LIBRARIES=/tmp/solution2.dylib /tmp/ex2 muwahahahaa
dyld[26035]: solution2.dylib has interposed '_memcmp' to replacing binds to 0x182F8CCB0 with 0x1002A3F58
dyld[26035]: interpose replaced 0x182F8CCB0 with 0x182F8CCB0 in /private/tmp/solution2.dylib
dyld[26035]: interpose replaced 0x182F8CCB0 with 0x1002A3F58 in /private/tmp/ex2
dyld[26035]: interpose: *0x1dd078880 = 0x1002a3f58 (JOP: diversity 0x0000, addr-div=1, key=IA)
dyld[26035]: interpose: *0x1dd07e818 = 0x1002a3f58 (JOP: diversity 0x0000, addr-div=1, key=IA)
dyld[26035]: interpose: *0x1dd07fe90 = 0x1002a3f58 (JOP: diversity 0x0000, addr-div=1, key=IA)
dyld[26035]: interpose: *0x1dd0951c0 = 0x1002a3f58 (JOP: diversity 0x0000, addr-div=1, key=IA)
...
In the above output, the original memcmp
's address is 0x182F8CCB0
and the new my_memcmp
's address is 0x1002A3F58
With ~540 lines omitted in the output above, it's easy to see that the memcmp
function is heavily referenced across all the loaded images in the ex2
process. This brings up an interesting component in interposing. Some interposing solutions will work across every single image that's loaded into a process, while other solutions will only work on a per image basis. The __DATA,__interpose
trick will work on every loaded image that is loaded after the interpose load command.
NOTE: One must be careful when replacing an undefined symbol across all images because critical logic could be altered elsewhere. For that reason, the caller's address should be checked via
__builtin_return_address(0)
to see if it's coming fromex2
ormy_memcmp
should only augment control depending upon the parameters. Another idea is to only interpose on a per-image basis.
Using the same code snippet from the previous example, execute the do_the_things
symbol through whatever means. This time, you're only allowed to interpose symbols declared in ex3.c
// ex3.c
// xcrun -sdk macosx clang -arch arm64 ex3.c -o /tmp/ex3 -O0 -mmacosx-version-min=12.6 -Wl,-interposable
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
#include <stdio.h>
#include <CommonCrypto/CommonCrypto.h>
char g_secret[CC_MD5_DIGEST_LENGTH] = "\x5f\x4d\xcc\x3b\x5a\xa7\x65\xd6"
"\x1d\x83\x27\xde\xb8\x82\xcf\x99";
int security_check(const char *password) {
char result[CC_MD5_DIGEST_LENGTH];
if (!password) {
return 0;
}
CC_MD5(password, strlen(password), (unsigned char*)result);
return memcmp(result, g_secret, CC_MD5_DIGEST_LENGTH) == 0;
}
static void do_the_thing(void) { // 1
printf("🌈success!🌈\n");
}
int main(int argc, const char* argv[]) {
if (security_check(argc > 1 ? argv[1] : NULL)) {
do_the_thing();
}
return 0;
}
Assessment Nothing has changed from the code between ex2.c
and ex3.c
. The main difference is a newly added -interposable
linker compilation flag and the challenge restricting to only interposing local symbols.
Compile and dump the potential options for interposing symbols on ex3
:
~ xcrun -sdk macosx clang -arch arm64 ex3.c -o /tmp/ex3 -O0 -mmacosx-version-min=12.6 -Wl,-interposable
~ nm /tmp/ex3 -Ug
0000000100000000 T __mh_execute_header
0000000100008000 D _g_secret
0000000100003edc T _main
0000000100003e20 T _security_check
Looks like g_secret
, main
, and security_check
are potential avenues for interposing. From the challenge's compilation source, there's a very interesting linker flag that's included called -interposable
As you saw earlier, undefined symbols can be bound at symbol lookup or upon module load. Typically, local symbols in the same image do not need to be bound lazily or at load time because the linker can resolve those symbols via relative offsets. However, it's possible to overwrite this setting through the -interposable
flag.
Public solutions exist to interpose symbols on a per image basis. One of the more popular repos is Facebook's fishhook which targets lazy symbol binding. Lazy symbol binding is the process in which an external symbol is bound upon the first time it is referenced in an image instead of when the image gets loaded. Although the How it works section provides an excellent overview, a lower level dive might be insightful. A detour will be taken to showcase how printf
gets bound into ex3
.
Compile ex3.c
's source with ld's -interposable
option.
~ xcrun -sdk macosx clang -arch arm64 ex3.c -o /tmp/ex3 -O0 -mmacosx-version-min=12.6 -Wl,-interposable
Then run Apple's preferred debugger, lldb
, on the ex3
executable:
~ lldb /tmp/ex3
(lldb) target create "ex3"
Current executable set to '/tmp/ex3' (arm64).
At this point, lldb
has not launched ex3
, so the process layout still matches the ex3
file layout on disk. No binding operations have happened at this point.
Dump the assembly to do_the_thing()
which calls printf
:
(lldb) disassemble -n do_the_thing
ex3`do_the_thing:
ex3[0x100003f40] <+0>: stp x29, x30, [sp, #-0x10]!
ex3[0x100003f44] <+4>: mov x29, sp
ex3[0x100003f48] <+8>: adrp x0, 0
ex3[0x100003f4c] <+12>: add x0, x0, #0xfa4 ; "\xf0\x9f\x8c\x88success!\xf0\x9f\x8c\x88\n"
ex3[0x100003f50] <+16>: bl 0x100003f80 ; symbol stub for: printf
ex3[0x100003f54] <+20>: ldp x29, x30, [sp], #0x10
ex3[0x100003f58] <+24>: ret
Looking at the disassembly comments, there's a branch call to address 0x100003f80
for printf
. Further information about the 0x100003f80
address can be queried via lldb
's image lookup
command:
(lldb) image lookup -a 0x100003f80
Address: ex3[0x0000000100003f80] (ex3.__TEXT.__stubs + 36)
Summary: ex3`symbol stub for: printf
This jumps to an internal Mach-O section in ex3
called __stubs
found in the __TEXT
segment. Disassembling this address produces the following relevant info:
(lldb) x/3i 0x100003f88
0x100003f80: 0xb0000010 adrp x16, 1
0x100003f84: 0xf9401210 ldr x16, [x16, #0x20]
0x100003f88: 0xd61f0200 br x16
Breaking these instructions down:
adrp x16, 1
- Load the next 4KB page intox16
relative to the 4KB floor of program counter, for this unslid address it would be:x16 = 0x100004000
(with the 4KB floor of the program counter being0x100003000
)ldr x16, [x16, #0x20]
Add 0x20 tox16
then dereference and store intox16
:x16 = *(x16 + 0x20)
br x16
: Call the code atx16
(0x100004020
)
So what is at 0x100004020
?
(lldb) image lookup -a 0x100004020
Address: ex3[0x0000000100004020] (ex3.__DATA_CONST.__got + 32)
Summary: (void *)0x8010000000000004
(lldb) x/i 0x100004020
0x100004020: 0x00000004 udf #0x4
__DATA_CONST.__got
is the containing Mach-O section for address 0x100004020
. The udf
assembly instruction found here would cause the program to throw an exception and crash. This means something gets modified from the point when ex3
is on disk to the point where ex3
is running. Fortunately, dyld
has another environment variable to see what's happening during binding, DYLD_PRINT_BINDINGS
.
The debug session below demonstrates how to use lldb
to set a breakpoint on main
and examine relevant binding contents while also displaying the dyld
environment variable.
(lldb) b main
Breakpoint 1: where = ex3`main, address = 0x0000000100003ed4
(lldb) process launch -E DYLD_PRINT_BINDINGS=1 -- password
Process 11633 launched: '/tmp/ex3' (arm64)
dyld[11633]: <ex3/bind#0> -> 0x18da961c4 (libcommonCrypto.dylib/_CC_MD5)
dyld[11633]: <ex3/bind#1> -> 0x182ea3ce8 (libsystem_c.dylib/___stack_chk_fail)
dyld[11633]: <ex3/bind#2> -> 0x1dbed5798 (libsystem_c.dylib/___stack_chk_guard)
dyld[11633]: <ex3/bind#3> -> 0x182f8ccb0 (libsystem_platform.dylib/__platform_memcmp)
dyld[11633]: <ex3/bind#4> -> 0x182e68ee8 (libsystem_c.dylib/_printf)
dyld[11633]: <ex3/bind#5> -> 0x182f8c860 (libsystem_platform.dylib/__platform_strlen)
dyld[11633]: fixup: *0x000100004000 = 0x00018DA961C4
dyld[11633]: fixup: *0x000100004008 = 0x000182EA3CE8
dyld[11633]: fixup: *0x000100004010 = 0x0001DBED5798
dyld[11633]: fixup: *0x000100004018 = 0x000182F8CCB0
dyld[11633]: fixup: *0x000100004020 = 0x000182E68EE8
dyld[11633]: fixup: *0x000100004028 = 0x000100003E18
dyld[11633]: fixup: *0x000100004030 = 0x000182F8C860
Process 11633 stopped
...
(lldb) x/gx 0x000100004020 # Inspect post-bound contents at 0x000100004020
0x100004020: 0x0000000182e68ee8
(lldb) memory region 0x000100004020 # Is this memory region writable?
[0x0000000100004000-0x0000000100008000) r-- __DATA_CONST
Modified memory (dirty) page list provided, 1 entries.
Dirty pages: 0x100004000.
(lldb) image lookup -a 0x000182E68EE8. # Query contents that were dereferenced at 0x000100004020
Address: libsystem_c.dylib[0x000000018021cee8] (libsystem_c.dylib.__TEXT.__text + 194108)
Summary: libsystem_c.dylib`printf
From the output, one can observe the 0x000100004020
address was bound to 0x000000018021cee8
, the address of printf
. After the bind has completed, the __DATA_CONST
Mach-O section gets write access removed so no one can muck around with interposing... but it's still possible to change memory protections using the relevant APIs.
As you can see, undefined bind on load symbols are resolved and stored into __DATA_CONST.__got
on ARM64 executables. Using lldb
, the size and starting address of the __DATA_CONST.__got
section is displayed:
(lldb) image dump section ex3
...
0x00000005 data-ptrs [0x0000000100004000-0x0000000100004038) rw- 0x00004000 0x00000038 0x00000006 ex3.__DATA_CONST.__got
...
For ex3
, the __got
section has a size of 0x38
, with each of these holding a pointer of 8 bytes. This means there are 7 function pointers in the __got
section:
(lldb) x/7gx 0x0000000100004000
0x100004000: 0x000000018da961c4 0x0000000182ea3ce8
0x100004010: 0x00000001dbed5798 0x0000000182f8ccb0
0x100004020: 0x0000000182e68ee8 0x0000000100003e20
0x100004030: 0x0000000182f8c860
# Examining the first address of the 7 pointers...
(lldb) image lookup -a 0x000000018da961c4
Address: libcommonCrypto.dylib[0x000000018ae4a1c4] (libcommonCrypto.dylib.__TEXT.__text + 1880)
Summary: libcommonCrypto.dylib`CC_MD5
# These are bound pointers to undefined symbols
The most interesting aspect of this detour is the size and ordering of the function pointers found in __DATA_CONST.__got
can match up with the ordering of the indirect symbol table. The indirect symbol table is an array of uint32_t
s that point to indices into the actual symbol table array.
The actual symbol table is an array of struct nlist[_64]
, which is described in <mach-o/nlist.h>
.
Use otool -l
to dump the Mach-O commands and search for the relevant __got/__stubs
content using grep
.
~ otool -l /tmp/ex3 | grep -E "(__got|_stubs)" -A10
sectname __stubs
segname __TEXT
addr 0x0000000100003f5c
size 0x0000000000000048
offset 16220
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000408
reserved1 0 (index into indirect symbol table) # <---
reserved2 12 (size of stubs)
--
sectname __got
segname __DATA_CONST
addr 0x0000000100004000
size 0x0000000000000038
offset 16384
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000006
reserved1 6 (index into indirect symbol table) # <---
reserved2 0
From the above otool
output, ex3
has an indirect symbol table start index for __got
at index 6, the start index for __stubs
is index 0.
You have the start index, now you need to find the file offset of this uint32_t
array. This is given by the indirectsymoff
member in the struct dysymtab_command
from the LC_DYSYMTAB
load command. Using otool
again and grep
'ing for indirectsym
will give the relevant information.
~ otool -l /tmp/ex3 | grep indirectsym
indirectsymoff 49592
nindirectsyms 13
The file offset to the uint32_t
indirect offset array is at 49592 in ex3
(for this compiled version of ex3
) whose size is 13 uint32_t
's. Dumping the raw bytes gives the following:
~ xxd -g 4 -e -s 49592 -l $((13 * 4)) /tmp/ex3
0000c1b8: 00000005 00000006 00000008 00000009 ................
0000c1c8: 00000004 0000000a 00000005 00000006 ................
0000c1d8: 00000007 00000008 00000009 00000004 ................
0000c1e8: 0000000a ....
Breaking down the options xxd
options:
-g 4
- Groups the bytes into a size of 4, which is the size ofuint32_t
.-e
- Dump the bytes in little endian byte format.-s 49592
- Start at offset 49592 from the beginning of theex3
file.-l $((13 * 4))
- Dump the size of 13uint32_t
s.
The output will dump the full indirect symbol table array. Cross referencing this data matches with the builtin indirect symbol table option -I
for otool
.
~ otool -I /tmp/ex3
/tmp/ex3:
Indirect symbols for (__TEXT,__stubs) 6 entries
address index
0x0000000100003f64 5
0x0000000100003f70 6
0x0000000100003f7c 8
0x0000000100003f88 9
0x0000000100003f94 4
0x0000000100003fa0 10
Indirect symbols for (__DATA_CONST,__got) 7 entries
address index
0x0000000100004000 5
0x0000000100004008 6
0x0000000100004010 7
0x0000000100004018 8
0x0000000100004020 9
0x0000000100004028 4
0x0000000100004030 10
Remember how offset 6 was the indirect symbol table start for the __got
section? This matches the dumped uint32_t
output from xxd
to the otool -I
option. With this information, you can finally dump the symbols to match the indexes!
Using the -p
option for nm
, the symbol table can be displayed in sequential order (instead of alphabetical order):
~ nm -p /tmp/ex3 | nl -v0
0 0000000100003f40 t _do_the_thing
1 0000000100000000 T __mh_execute_header
2 0000000100008000 D _g_secret
3 0000000100003ed4 T _main
4 0000000100003e18 T _security_check
5 U _CC_MD5
6 U ___stack_chk_fail
7 U ___stack_chk_guard
8 U _memcmp
9 U _printf
10 U _strlen
Recall how 0x0000000100004020
matched to printf
when exploring ex3
's __DATA_CONST.__got
in lldb
. You can see that printf
is at index 9 (starting from 0) in the nm
ordered output of the symbol table, which matches otool -I ex3
's index 9 for address 0x0000000100004020
.
Coming back to the problem at hand for ex3
and binding, the -interpose
option was applied to ex3
resulting in security_check
being bound to address 0x0000000100004028
(or equivalent on your build of ex3
).
The solution needs to find the beginning address to the main
executable which contains the Mach-O load commands, find the offset to __DATA_CONST
, make that region of memory writeable, patch the correct security_check
bind address to a controlled function, and (optionally) make the __DATA_CONST
segment read only again. Instead of writing lengthy logic to parse Mach-O load commands, a solution will be presented that simply looks in the __got
section for any pointer to security_check
and then patch it to a controlled function.
// solution3.c
// xcrun -sdk macosx clang -arch arm64 solution3.c -O0 -shared -o /tmp/solution3.dylib -mmacosx-version-min=12.6 -Wl,-U,__mh_execute_header,-U,_security_check
#include <stdio.h>
#include <sys/mman.h>
#include <sys/errno.h>
#include <mach/mach.h>
#include <mach-o/getsect.h>
extern void* _mh_execute_header;
extern int security_check();
#define HANDLE_ERR(E) {\
if ((E)) printf("Error: %d, %s @ %s:%d\n", (E), mach_error_string((E)), __FUNCTION__, __LINE__);}
int my_security_check(void) {
printf("interposed security_check! returning match\n");
return 1;
}
__attribute__((constructor)) static void oninit() { // 1
uintptr_t start = (uintptr_t)&_mh_execute_header; // 2
uintptr_t resolved = 0;
size_t sz = 0;
// 3
uintptr_t* got = (void*)getsectiondata((void*)&_mh_execute_header, "__DATA_CONST", "__got", &sz);
for (int i = 0; i < sz / 8; i++) {
if (got[i] == (uintptr_t)security_check) {
resolved = (uintptr_t)&got[i];
break;
}
}
if (!resolved) {
printf("Couldn't find security_check, bailing\n");
return;
}
printf("start is 0x%012lx, patching offset 0x%012lx\n", start, resolved);
// 4
task_t task = mach_task_self();
HANDLE_ERR(vm_protect(task, resolved, 8, FALSE, VM_PROT_READ|VM_PROT_WRITE));
// 5
uintptr_t my_security_check_ptr = (uintptr_t)my_security_check;
HANDLE_ERR(vm_write(task, resolved, (vm_offset_t)&my_security_check_ptr, 8));
// 6
HANDLE_ERR(vm_protect(task, resolved, 8, FALSE, VM_PROT_READ));
}
Breaking down important points:
- A constructor attribute is used this time, so the
oninit
function is called on image load. This occurs before themain
function inex3
executes but after the symbol binding occurs onex3
. - You might have seen references to the
_mh_execute_header
symbol earlier when dumping the symbol table for an executable. This symbol is inserted by the compiler for executables (and not for dylibs). This can be used to get the start address of the main executable at runtime which is helpful due to the memory layout being slid around every time its launched. This is known as ASLR and is outside the scope of this tutorial, but interested readers can google more info. - Once the header to the main executable is resolved, the
getsectiondata
API is used to find the address to__DATA_CONST.__got
. Since these addresses are bound at this time, one can simply walk the size of the section searching for references tosecurity_check
. A lengthier but more elegant solution would be to use the knowledge from above to grab the indirect and direct symbol table to find the exact address that's needed to be patched. __DATA_CONST
is read-only by the time this code has access to it so the memory protection needs to be modified. Apple has a powerful set of Machvm_*
APIs that can work across processes with what is known as a task. In order to get a handle for the task belonging to the current process, you can use themach_task_self()
API. Mach is a detailed and complex topic which is also outside the scope of this writeup.- The new local overwritten function pointer to
my_security_check
is applied. - After the function pointer is patched,
__DATA_CONST
is made read only again.
Compiling solution3.c
then running with the dyld
binding flags environment variable produces the following:
~ xcrun -sdk macosx clang -arch arm64 solution3.c -O0 -shared -o /tmp/solution3.dylib -mmacosx-version-min=12.6 -Wl,-U,__mh_execute_header,-U,_security_check
~ DYLD_INSERT_LIBRARIES=/tmp/solution3.dylib DYLD_PRINT_BINDINGS=1 /tmp/ex3
dyld[46523]: <solution3.dylib/bind#0> -> 0x10064c000 (ex3/__mh_execute_header)
dyld[46523]: <solution3.dylib/bind#1> -> 0x18da74690 (libmacho.dylib/_getsectiondata)
dyld[46523]: <solution3.dylib/bind#2> -> 0x182f41208 (libsystem_kernel.dylib/_mach_error_string)
dyld[46523]: <solution3.dylib/bind#3> -> 0x1dbed5aec (libsystem_kernel.dylib/_mach_task_self_)
dyld[46523]: <solution3.dylib/bind#4> -> 0x182e68ee8 (libsystem_c.dylib/_printf)
dyld[46523]: <solution3.dylib/bind#5> -> 0x10064fe20 (ex3/_security_check)
dyld[46523]: <solution3.dylib/bind#6> -> 0x182f3f4f8 (libsystem_kernel.dylib/_vm_protect)
dyld[46523]: <solution3.dylib/bind#7> -> 0x182f65934 (libsystem_kernel.dylib/_vm_write)
dyld[46523]: fixup: *0x00010076C000 = 0x00010064C000
dyld[46523]: fixup: *0x00010076C008 = 0x00018DA74690
dyld[46523]: fixup: *0x00010076C010 = 0x000182F41208
dyld[46523]: fixup: *0x00010076C018 = 0x0001DBED5AEC
dyld[46523]: fixup: *0x00010076C020 = 0x000182E68EE8
dyld[46523]: fixup: *0x00010076C028 = 0x00010064FE20
dyld[46523]: fixup: *0x00010076C030 = 0x000182F3F4F8
dyld[46523]: fixup: *0x00010076C038 = 0x000182F65934
dyld[46523]: <ex3/bind#0> -> 0x18da961c4 (libcommonCrypto.dylib/_CC_MD5)
dyld[46523]: <ex3/bind#1> -> 0x182ea3ce8 (libsystem_c.dylib/___stack_chk_fail)
dyld[46523]: <ex3/bind#2> -> 0x1dbed5798 (libsystem_c.dylib/___stack_chk_guard)
dyld[46523]: <ex3/bind#3> -> 0x182f8ccb0 (libsystem_platform.dylib/__platform_memcmp)
dyld[46523]: <ex3/bind#4> -> 0x182e68ee8 (libsystem_c.dylib/_printf)
dyld[46523]: <ex3/bind#5> -> 0x182f8c860 (libsystem_platform.dylib/__platform_strlen)
dyld[46523]: fixup: *0x000100650000 = 0x00018DA961C4
dyld[46523]: fixup: *0x000100650008 = 0x000182EA3CE8
dyld[46523]: fixup: *0x000100650010 = 0x0001DBED5798
dyld[46523]: fixup: *0x000100650018 = 0x000182F8CCB0
dyld[46523]: fixup: *0x000100650020 = 0x000182E68EE8
dyld[46523]: fixup: *0x000100650028 = 0x00010064FE20
dyld[46523]: fixup: *0x000100650030 = 0x000182F8C860
start is 0x00010064c000, patching offset 0x000100650028
interposed security_check! returning match
🌈success!🌈
Excellent : ]
Execute the do_the_thing()
function through whatever means, but you're only allowed to modify executable memory in ex4
.
// ex4.c
// xcrun -sdk macosx clang -arch arm64 ex4.c -o /tmp/ex4 -O0 -mmacosx-version-min=12.6
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
#include <stdio.h>
#include <CommonCrypto/CommonCrypto.h>
int security_check(const char *password) {
char result[CC_MD5_DIGEST_LENGTH];
if (!password) {
return 0;
}
CC_MD5(password, strlen(password), (unsigned char *)result);
char secret[CC_MD5_DIGEST_LENGTH] = "\x5f\x4d\xcc\x3b\x5a\xa7\x65\xd6"
"\x1d\x83\x27\xde\xb8\x82\xcf\x99";
return memcmp(result, secret, CC_MD5_DIGEST_LENGTH) == 0;
}
__attribute__((always_inline)) // 1
static void do_the_thing(void) {
printf("🌈success!🌈\n");
}
int main(int argc, const char* argv[]) {
if (security_check(argc > 1 ? argv[1] : NULL)) { // 4
do_the_thing();
}
return 0;
}
Assessment Things are getting more interesting. The do_the_thing()
function is no longer a stand alone function, thanks to __attribute__((always_inline))
. If you were to compile this and use nm
you would not see a symbol for this function anymore. The contents of the do_the_thing()
function will get compiled directly into the main
function.
For this challenge, a new restriction is added. Only executable memory can be modified. Fortunately, it is possible to hijack control through symbol hooking. You will augment executable memory to jump to a different location by patching executable memory at runtime. Since security_check
is a public symbol and gates control to the newly inlined do_the_thing()
function, patching security_check
looks to be the ideal target.
When patching executable memory, there are several headaches trade-offs one needs to consider. Jumping to an address that is farther away requires more assembly instructions. ARM64 operands are only 4 bytes in size, so a pointer of 8 bytes can't all fit into one instruction. As a result, branching to a function often occurs via a relative offset to the program counter. This means one must be conscious of the difference between the address to patch as well as the address that one wants to jump to. In addition, the size of the augmented function must be considered so as to not overwrite the contents of a different function.
NOTE Patching executable memory only gets more complicated if one wants to call the original function inside of the patched function. Diverting control from the original function means there are assembly instructions that are no longer there due to the patch. In order to work around this, one either needs to temporarily repatch the original instructions, or patch the call sites to the original function, or attempt to replicate and execute the original patched instructions and jump to the offset immediately after the injected shellcode. Fortunately, this writeup steers clear of all those ideas preferring to entirely replace the original function.
For this solution, 3 ARM64 operands will be used and inserted into the beginning of security_check
to divert control to a new function that always returns success. This will result in 12 bytes being replaced into the beginning of security_check
. These instructions: are ADRP, ADD, & BR.
Breaking down the pseudocode for each of these instructions and what will happen:
adrp x8, ((M(D) - M(S)) / 4096
- Where S is the start address (address ofsecurity_check
), D is the destination address (address of the soon to be created patched function calledmy_security_check
) andM()
ensures the address is aligned to 4KB. This value is then divided by 4KB to figure out the offset to jump to from the current program counter (pc). This gives a +-4GB offset from the program counter to jump to and will store the value into thex8
register. An assumption is made that the solution's executable memory will be within this range. If this does not hold true, more assembly instructions would be needed to load an absolute address by OR'ing in different parts of the address and storing it in the same register.add x8, x8, (D & 0xFFF)
- Theadrp
instruction will get the program counter within a 4KB range of where to execute. The final value that is needed can be set to a register via theadd
instruction.br x8
- Once thex8
register contains the appropriate address, branch to the address.
Here's the source code to generate the shellcode and patch the security_check
function before it executes:
// solution4.c
// xcrun -sdk macosx clang -arch arm64 solution4.c -O0 -shared -o /tmp/solution4.dylib -mmacosx-version-min=12.6 -Wl,-U,_security_check
#include <stdio.h>
#include <mach/mach.h>
#include <stdbool.h>
#include <assert.h>
extern int security_check(void);
#define HANDLE_ERR(E) {\
if ((E)) printf("Error: %d, %s @ %s:%d\n", (E), mach_error_string((E)), __FUNCTION__, __LINE__);}
int my_security_check(void) {
printf("interposed security_check! returning success\n");
return 1;
}
uint32_t CREATE_ADRP_OP(uint8_t reg, uintptr_t start_addr, uintptr_t dest_addr) { // 1
typedef struct {
uint32_t reg : 5; //
uint32_t val : 18; //
uint32_t negative : 1; // If true everything will need to be 2's complement including val2bits
uint32_t op2 : 5; // must be 0b10000
uint32_t val2bits : 2; // The lower 2 bits of a value (if any) are stored here
uint32_t op : 1; // must be 1
} ardp_op;
uint32_t op = 0;
assert(sizeof(ardp_op) == sizeof(uint32_t));
ardp_op *a = (void*)&op;
a->op = 1;
a->op2 = 0b10000;
uintptr_t mask = ~((uintptr_t)0xfff);
int32_t offset = ((int32_t)((dest_addr & mask) - (start_addr & mask))) / 4096;
a->negative = offset < 0 ? 1 : 0;
a->reg = reg;
a->val2bits = (offset & 3);
// Remaing val contains bit 3 and on, throw away first 2 bits
a->val = (offset >> 2);
return op;
}
uint32_t CREATE_BR_OP(uint8_t dreg) {
typedef struct {
uint32_t unused : 5; // 0
uint32_t dreg : 5; // Which register to branch to
uint32_t op : 22; // Should be 0b1101011000011111000000
} brreg_op;
uint32_t op = 0;
assert(sizeof(brreg_op) == sizeof(uint32_t));
brreg_op *a = (void*)&op;
a->op = 0b1101011000011111000000;
a->dreg = dreg;
return op;
}
uint32_t CREATE_ADD_OP(uint8_t dreg, uint8_t sreg, int16_t val, bool lslshift) { // 2
typedef struct {
uint32_t dreg : 5; // destination register
uint32_t sreg : 5; // source register
uint32_t val : 12; // val to be added, i.e. x4 = x6 + 0x123
uint32_t lsl : 1; // #lsl #12 to val
uint32_t op2 : 7; // Should be 0b01000100
uint32_t negative : 1; // 1 if negative
uint32_t op : 1; // Should be 0b1
} add_op;
uint32_t op = 0;
assert(sizeof(add_op) == sizeof(uint32_t));
add_op *a = (void*)&op;
a->op = 1;
a->lsl = lslshift;
a->op2 = 0b0100010;
a->dreg = dreg;
a->sreg = sreg;
a->val = val;
a->negative = val < 0 ? 1 : 0;
return op;
}
__attribute__((constructor)) static void oninit() {
vm_address_t security_check_ptr = (vm_address_t)security_check;
vm_address_t my_security_check_ptr = (vm_address_t)my_security_check;
printf("attempting to patch security_check @ 0x%012lx with my_security_check 0x%012lx\n", security_check_ptr, my_security_check_ptr);
// 3
char shellcode[12] = {};
uint8_t reg8 = 8;
uint32_t adrpop = CREATE_ADRP_OP(reg8, security_check_ptr, my_security_check_ptr);
uint32_t addop = CREATE_ADD_OP(reg8, reg8, (uintptr_t)my_security_check_ptr & 0xfff, 0);
uint32_t brop = CREATE_BR_OP(reg8);
memcpy((void*)&shellcode[0], &adrpop, 4);
memcpy((void*)&shellcode[4], &addop, 4);
memcpy((void*)&shellcode[8], &brop, 4);
// 4
task_t task = mach_task_self();
kern_return_t kr = vm_protect(task, security_check_ptr, 12, FALSE, VM_PROT_READ|VM_PROT_WRITE);
if (kr) { // If we get an error it's likely because of copy on write protection
kr = vm_protect(task, security_check_ptr, 12, FALSE, VM_PROT_READ|VM_PROT_WRITE|VM_PROT_COPY);
}
HANDLE_ERR(kr);
// 5
HANDLE_ERR(vm_write(task, security_check_ptr, (vm_offset_t)&shellcode, 12));
kr = vm_protect(task, security_check_ptr, 12, FALSE, VM_PROT_READ | VM_PROT_EXECUTE);
if (kr) {
kr = vm_protect(task, security_check_ptr, 12, FALSE, VM_PROT_READ | VM_PROT_EXECUTE|VM_PROT_COPY);
}
HANDLE_ERR(kr);
}
Ouch. That's a lot of code, but the majority of it is to generate the assembly instructions to patch the security_check
which will not be discussed in depth. Breaking down the interesting points:
- The
CREATE_ADRP_OP
function will create an ARM64 instruction that will calculate the floor of the 4KB memory aligned address it needs to jump to relative to the current address it is patching. - The
CREATE_ADD_OP
will get the register to the final offset relative to the 4KB memory alignment address currently stored into the register. In this case, it will set register x8 to the value ofmy_security_check
. - The
adrp
+add
+br
set of instructions are assembled together and made into shellcode. Thebr x8
instruction will branch to that location without linking to a return address giving it illusion that the hooking function was called directly. - This attempts to change the executable memory into temporarily writable memory. This code differs from the previous solution in that if the write permissions fail, the code will try to create a copy of the memory that is writeable.
- The shellcode is written to the beginning of the
security_check
method. It is assumed that thesecurity_check
function is longer than 3 assembly instructions (12 bytes). Ideally, there should be code to check the size of this function, which can be determined through theLC_FUNCTION_STARTS
load command.
With everything compiled and giving it a run:
~ xcrun -sdk macosx clang -arch arm64 solution4.c -O0 -shared -o /tmp/solution4.dylib -mmacosx-version-min=12.6 -Wl,-U,_security_check
~ DYLD_INSERT_LIBRARIES=/tmp/solution4.dylib /tmp/ex4
attempting to patch security_check @ 0x000100e6be18 with my_security_check 0x00010103f9e0
interposed security_check! returning success
🌈success!🌈
Execute the do_the_thing()
logic through whatever means, but you may not modify executable memory nor may you interpose undefined symbols. In addition, ex5
must also exit successfully.
// ex5.c
// xcrun -sdk macosx clang -arch arm64 ex5.c -o /tmp/ex5 -O3 -mmacosx-version-min=12.6 -Wl,-no_function_starts -fstack-protector-all && strip /tmp/ex5 #2
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
#include <stdio.h>
#include <CommonCrypto/CommonCrypto.h>
__attribute__((always_inline)) // 1
int security_check(const char *password) {
char result[CC_MD5_DIGEST_LENGTH];
if (!password) {
return 0;
}
CC_MD5(password, strlen(password), (unsigned char *)result);
char secret[CC_MD5_DIGEST_LENGTH] = "\x5f\x4d\xcc\x3b\x5a\xa7\x65\xd6"
"\x1d\x83\x27\xde\xb8\x82\xcf\x99";
return memcmp(result, secret, CC_MD5_DIGEST_LENGTH) == 0;
}
__attribute__((always_inline)) // 1
static void do_the_thing(void) {
printf("🌈success!🌈\n");
}
int main(int argc, const char* argv[]) {
if (security_check(argc > 1 ? argv[1] : NULL)) { // 4
do_the_thing();
}
return 0;
}
Assessment It's getting trickier. security_check
is no longer a standalone function thanks to the always_inline
attribute (#1). In addition, new compilation flags have been added. They include:
- All function names are removed thanks to the
strip
command. - The
-Wl,-no_function_starts
instructs the linker to not embed the LC_FUNCTION_STARTS Mach-O load command in the resolved binary. This load command lists the start (and thereby the size) of every compiled function found inex5
. Without this load command, it becomes significantly trickier to determine where functions begin and end. -fstack-protector-all
inserts stack protection logic at the beginning and end of callsites withinex5
. This makes jumping to offsets within functions trickier.
It's best to see what is under the hood. Compile and dump the assembly for ex5
:
~ xcrun -sdk macosx clang -arch arm64 ex5.c -o /tmp/ex5 -O3 -mmacosx-version-min=12.6 -Wl,-no_function_starts -fstack-protector-all && strip /tmp/ex5
~ otool -tV /tmp/ex5
ex5:
(__TEXT,__text) section
3 -> 0000000100003ea8 sub sp, sp, #0x40
0000000100003eac stp x20, x19, [sp, #0x20]
0000000100003eb0 stp x29, x30, [sp, #0x30]
0000000100003eb4 add x29, sp, #0x30
0000000100003eb8 nop
0000000100003ebc ldr x8, #0x154 ; literal pool symbol address: ___stack_chk_guard
0000000100003ec0 ldr x8, [x8]
0000000100003ec4 str x8, [sp, #0x18]
0000000100003ec8 cmp w0, #0x1
0000000100003ecc b.le 0x100003f30
0000000100003ed0 ldr x19, [x1, #0x8]
0000000100003ed4 cbz x19, 0x100003f30
0000000100003ed8 mov x0, x19
1 -> 0000000100003edc bl 0x100003f84 ; symbol stub for: _strlen
0000000100003ee0 mov x1, x0
0000000100003ee4 add x2, sp, #0x8
0000000100003ee8 mov x0, x19
1 -> 0000000100003eec bl 0x100003f60 ; symbol stub for: _CC_MD5
0000000100003ef0 mov x8, #0x4d5f
0000000100003ef4 movk x8, #0x3bcc, lsl #16
0000000100003ef8 movk x8, #0xa75a, lsl #32
0000000100003efc movk x8, #0xd665, lsl #48
0000000100003f00 ldp x9, x10, [sp, #0x8]
0000000100003f04 eor x8, x9, x8
0000000100003f08 mov x9, #0x831d
0000000100003f0c movk x9, #0xde27, lsl #16
0000000100003f10 movk x9, #0x82b8, lsl #32
0000000100003f14 movk x9, #0x99cf, lsl #48
0000000100003f18 eor x9, x10, x9
0000000100003f1c orr x8, x8, x9
2 -> 0000000100003f20 cbnz x8, 0x100003f30
0000000100003f24 adr x0, #0x7c ; literal pool for: "\360\237\214\210success!\360\237\214\210"
0000000100003f28 nop
0000000100003f2c bl 0x100003f78 ; symbol stub for: _puts
3 -> 0000000100003f30 ldr x8, [sp, #0x18]
0000000100003f34 nop
0000000100003f38 ldr x9, #0xd8 ; literal pool symbol address: ___stack_chk_guard
0000000100003f3c ldr x9, [x9]
0000000100003f40 cmp x9, x8
0000000100003f44 b.ne 0x100003f5c
0000000100003f48 mov w0, #0x0
0000000100003f4c ldp x29, x30, [sp, #0x30]
0000000100003f50 ldp x20, x19, [sp, #0x20]
0000000100003f54 add sp, sp, #0x40
0000000100003f58 ret
0000000100003f5c bl 0x100003f6c ; symbol stub for: ___stack_chk_fail
From the above assembly and the given challenges of returning without error and being unable to interpose symbols nor modify executable memory, three ideas stand out which are highlighted with arrows.
- Put a hardware breakpoint (which doesn't modify executable memory) on
strlen
orCC_MD5
and have a debugger catch either function. Once caught, modify the return address via settings the lr register to get past the checks. For this case, address0x0000000100003f24
looks like a good candidate to return to as it sidesteps all of the conditional checks. One must ensure that the caller is coming fromex5
, which can be done via a__builtin_return_address(0)
. - Put a hardware breakpoint at the start of
main
and have a debugger step through each assembly instruction as they occur while the debugger catches and processes each opcode. At address0x0000000100003f20
, there's thecbnz x8, 0x100003f30
opcode instruction which branches past executingdo_the_thing()
when the register is non-zero. When the debugger sees thecbnz
instruction occur, have the debugger modify the register before the assembly instruction occurs resulting in the conditional check succeeding. - Put a hardware breakpoint at the start of the
main
function and have a debugger directly set execution control to0x0000000100003f24
. Put another hardware breakpoint on address0x0000000100003f30
and then set the program counter to0x0000000100003f48
, which contains the logic to return gracefully frommain
. This essentially let's the program jump to the relevant code while sidestepping the security checks.
Although it's far from the most efficient solution for code length, idea 2 seems the most interesting to implement. The following code is broken into 2 snippets given the length of code needed to create an process debugger that steps through and processes assembly instructions.
First the logic is setup a dylib
that's a debugger in order to catch breakpoints:
// solution5.c
// xcrun -sdk macosx clang -arch arm64 solution5.c -O0 -shared -o /tmp/solution5.dylib -mmacosx-version-min=12.6 -Wl,-U,__mh_execute_header
#include <stdlib.h>
#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>
#include <mach/mach.h>
#include <pthread.h>
#include <mach-o/ldsyms.h>
#include <mach-o/getsect.h>
#include <assert.h>
#define HANDLE_ERR(E) {\
if ((E)) printf("Error: %d, %s @ %s:%d\n", (E), mach_error_string((E)), __FUNCTION__, __LINE__);}
// 1
#pragma pack(push, 4)
typedef struct {
mach_msg_header_t Head;
/* start of the kernel processed data */
mach_msg_body_t msgh_body;
mach_msg_port_descriptor_t thread;
mach_msg_port_descriptor_t task;
/* end of the kernel processed data */
NDR_record_t NDR;
exception_type_t exception;
mach_msg_type_number_t codeCnt;
int64_t code[2];
} exc_req;
typedef struct {
mach_msg_header_t Head;
NDR_record_t NDR;
kern_return_t RetCode;
} exc_resp;
#pragma pack(pop)
// 2
typedef struct {
uint32_t reg : 5;
uint32_t val : 19;
uint32_t isnz : 1; // 1 for cbnz 0 for cbz
uint32_t op : 6; // must be 0b011010
uint32_t is64bit : 1; // 1 if x[VAL], 0 if w[VAL] for reg
} cbz_op;
#define IS_COMPARE_OP(X) ((X).op == 0b011010 && (X).isnz == 1 )
#define S_USER ((uint32_t)(2u << 1))
#define BCR_ENABLE ((uint32_t)(1u))
#define SS_ENABLE ((uint32_t)(1u))
#define BCR_BAS ((uint32_t)(15u << 5))
static mach_port_t exc_port = MACH_PORT_NULL;
static uintptr_t main_addr = 0;
void* server_thread(void *arg);
__attribute__((constructor)) static void oninit() {
// 3
const struct mach_header_64 *header = &_mh_execute_header;
char *cur = (char*)header + sizeof(struct mach_header_64);
for (uint32_t i = 0; i < header->ncmds; i++) {
struct load_command *cmd = (void*)cur;
if (cmd->cmd == LC_MAIN) {
struct entry_point_command *entry = (void*)cmd;
main_addr = (uintptr_t)&_mh_execute_header + entry->entryoff;
break;
}
cur += cmd->cmdsize;
}
if (!main_addr) {
printf("couldn't find entrypoint\n");
return;
}
// 4
mach_port_options_t options = {.flags = MPO_INSERT_SEND_RIGHT};
HANDLE_ERR(mach_port_construct(mach_task_self(), &options, 0, &exc_port));
HANDLE_ERR(task_set_exception_ports(mach_task_self(), EXC_MASK_BREAKPOINT, exc_port, EXCEPTION_DEFAULT|MACH_EXCEPTION_CODES, THREAD_STATE_NONE));
printf("Exception port setup with port %d\n", exc_port);
// 5
arm_debug_state64_t dbg = {};
mach_msg_type_number_t cnt = ARM_DEBUG_STATE64_COUNT;
HANDLE_ERR(thread_get_state(mach_thread_self(), ARM_DEBUG_STATE64, (thread_state_t)&dbg, &cnt));
dbg.__bvr[0] = (__int64_t)main_addr;
dbg.__bcr[0] = S_USER|BCR_ENABLE|BCR_BAS;
HANDLE_ERR(thread_set_state(mach_thread_self(), ARM_DEBUG_STATE64, (thread_state_t)&dbg, cnt));
printf("Breakpoint set on main 0x%012lx (offset 0x%06lx)\n", main_addr, main_addr - (uintptr_t)&_mh_execute_header);
// 6
static pthread_t exception_thread;
if (pthread_create(&exception_thread, NULL, server_thread, &exc_port)) {
return;
}
pthread_detach(exception_thread);
usleep(500);
}
// continued...
Breaking down the interesting points:
- When creating a "debugger" for Apple OS's, mach messages are used so the kernel can talk to the debugger which replies on how it wants to handle these messages. This can be any process and can even live in the same process, which is the case here. This communication mechanism is typically generated over the Mach Interface Generator or simply, mig, which generates the structs to send and receive information. The file responsible for this is
<mach/mach_exc.defs>
. Typically a developer will use themig
tool to generate the interface protocol and include the files needed to communicate. In order to keep the code as small as possible, the interface is extracted out and directly compiled into the solution. For interested readers, seecp $(xcrun --show-sdk-path)/usr/include/mach/mach_exc.defs /tmp/ && mig /tmp/mach_exc.defs && cat /tmp/mach_exc.h
- A struct which determines if an ARM64
cbnz
opcode is declared. This will be used when reading values off the program counter as the debugger is single stepping through instructions. - Since the symbol names were stripped out, there needs to be logic to find the address of
main
. This is accomplished with theLC_MAIN
load command found at the beginning of all executables (and not dylibs). This will be used to create the software breakpoint. - A mach port is created and set to be the receiver for "breakpoint exceptions". This is the basis for handling exception logic and is a complex topic outside the scope of this writeup. Interested consumers can be notified of a variety of exceptions but only EXC_MASK_BREAKPOINT is used. Check out
<mach/exception_types.h>
for more options that can be caught. - The port to catch a software/hardware breakpoint is setup. The hardware breakpoint is set to the start of
main
via thethread_set_state
API. - After everything is setup, a new thread is spun up to handle all debugging communication with the kernel. This new thread will call into
void* server_thread(void *arg)
which will be shown in the next code snippet.
The logic for the debugger is setup, now the server_thread
will facilitate how the debugger interacts interacts with the kernel and ex5
:
// continued...
void* server_thread(void *arg) {
// 1
const struct section_64 *section = getsectbynamefromheader_64(&_mh_execute_header, "__TEXT", "__text");
uint64_t text_start = (uintptr_t)&_mh_execute_header + section->offset;
uint64_t text_sz = section->size;
bool success = false;
kern_return_t kr;
char buffer[0x400] = {};
pthread_setname_np("Exception Handler");
printf("exception server starting\n");
while(1) {
// 2
mach_msg_header_t *msg = (void*)buffer;
msg->msgh_remote_port = MACH_PORT_NULL;
msg->msgh_id = 2405;
msg->msgh_local_port = exc_port;
msg->msgh_size = 0x400;
if ((kr = mach_msg_receive(msg))) {
HANDLE_ERR(kr);
break;
}
exc_req *req = (void*)buffer;
thread_t thread = req->thread.name;
// 3
arm_thread_state64_t state = {};
mach_msg_type_number_t count = ARM_THREAD_STATE64_COUNT;
HANDLE_ERR(thread_get_state(thread, ARM_THREAD_STATE64, (thread_state_t)&state, &count));
// 4
#if __has_feature(ptrauth_calls)
uintptr_t pc = (uintptr_t)ptrauth_strip(state.__opaque_pc, ptrauth_key_function_pointer);
#else
uintptr_t pc = state.__pc;
#endif
// 5
arm_debug_state64_t dbg = {};
mach_msg_type_number_t dbg_cnt = ARM_DEBUG_STATE64_COUNT;
HANDLE_ERR(thread_get_state(thread, ARM_DEBUG_STATE64, (thread_state_t)&dbg, &dbg_cnt));
if (!success) {
dbg.__mdscr_el1 |= SS_ENABLE; // enables instruction single step
dbg.__bcr[0] &= ~(BCR_ENABLE); // disable software breakpoint
dbg.__bvr[0] = 0; // superfluous but shows register isn't used
HANDLE_ERR(thread_set_state(thread, ARM_DEBUG_STATE64, (thread_state_t)&dbg, ARM_DEBUG_STATE64_COUNT));
}
if (text_start <= pc && pc < text_start + text_sz) {
// 6
cbz_op opcode = {};
assert(sizeof(opcode) == sizeof(uint32_t));
vm_size_t cnt = 4;
HANDLE_ERR(vm_read_overwrite(req->task.name, pc, cnt, (vm_address_t)&opcode, &cnt));
if (IS_COMPARE_OP(opcode)) {
printf("Patching register x%d at address 0x%012lx (0x%06lx)\n", opcode.reg, pc, pc - (uintptr_t)&_mh_execute_header);
state.__x[opcode.reg] = 0;
HANDLE_ERR(thread_set_state(thread, ARM_THREAD_STATE64, (thread_state_t)&state, ARM_THREAD_STATE64_COUNT));
success = true;
}
}
// 7
msg->msgh_local_port = MACH_PORT_NULL;
msg->msgh_bits = MACH_RCV_MSG | MACH_SEND_TIMEOUT;
msg->msgh_id = 2505;
msg->msgh_size = sizeof(exc_resp);
exc_resp *resp = (exc_resp*)msg;
resp->NDR = NDR_record;
resp->RetCode = KERN_SUCCESS;
if ((kr = mach_msg_send(msg))) {
HANDLE_ERR(kr);
break;
}
}
return NULL;
}
Here's the interesting points in server_thread
:
- For processing assembly opcodes, there's only interest in looking in opcodes found in
ex5
.This logic finds theex5
's upper and lower bounds for executable memory. Later on, the program counter will be extracted out and compared against these bounds. - This is part of the boiler plate logic that was done in the
mig
mach_exc.defs
generated file. Once the payload is setup, it polls waiting for an "event", which will be the breakpoint onmain
. When the breakpoint trips, control will send a message out to the debugger and execution will resume pastmach_msg_receive
while the kernel waits for a response on how to handle the frozen thread, thanks to the hardware breakpoint onmain
. - The
arm_thread_state64_t
will contain all the values for registers, including the link register and the program counter, which can be modified using thethread_set_state
API. - An interesting component to working with ARM64e CPUs is that the program counter could have pointer authentication for ARM64e CPU slices. This value needs to be removed so a bogus program counter is not interpreted.
- The
arm_debug_state64_t
is another awesome struct for thethread_(get|set)_state
API. As you saw earlier this can set software breakpoints, watchpoints or even do instruction stepping. The line of code to instruction step isdbg.__mdscr_el1 |= SS_ENABLE
. This will result in instruction step immediately following a call back into the debugger as it raises a breakpoint exception. - The opcode is read from the current program counter. If the opcode is a
cbnz
, then the register is set to the opposite value and saved. - This logic is to reply to the kernel saying that this exception has be handled (thanks to
resp->RetCode = KERN_SUCCESS
) and it is OK for the program to resume execution.
Putting the code together and running:
~ xcrun -sdk macosx clang -arch arm64 solution5.c -O0 -shared -o /tmp/solution5.dylib -mmacosx-version-min=12.6 -Wl,-U,__mh_execute_header
~ DYLD_INSERT_LIBRARIES=/tmp/solution5.dylib /tmp/ex5 boom
Exception port setup with port 2563
Breakpoint set on main 0x000100e7fea8 (offset 0x003ea8)
exception server starting
Patching register x8 at address 0x000100e7ff20 (offset 0x003f20)
🌈success!🌈
As you have seen, there are many ways to go about altering execution of code. Hopefully this was insightful and you have a better understanding of the different strategies that are available.
Have fun jumping around 🍻