Binary exploits can be used for a lot of different things. It can be used to find vulnerabilities in most programs.
The hacker knows that it is not the code written by the programmer that gets executed by the computer. Since a program is compiled into machine-code it is actully the machine code that gets executed. And the human readable representation of machine code is assembly. So yeah, let's learn some assembly.
A lot in this chapter is just my notes from reading Hacking - The Art of Exploitation. So if you really want to learn binary exploitation you should probably stop reading here and just pick up that book instead, it is a lot better.
Hexadecimal is a base 16 counting system. 0-9 plus A-F. So the numbers are, 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. So we can count to 16 without using two digits. So F == 16 This is pretty convenient because one byte is made up of 8 bits. And with 8 bits (0 and 1) you. can form 256 (2^8) different values. So two hexadecimal digits can represent values up to 256. Just like two decimal digits can represent value up to 99.
So two hexadecimal digits can represent any byte value. So one byte can be translated into a two digit hexadecimal value.
01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14....
a0,a1, a2, a3, a4...
f0, f1, f2, f3...
And so on.
So in order to analyze assembly we are going to write a short program in C.
#include <stdio.h>
int main(){
for (size_t i = 0; i < 10; i++) {
puts("Hello world");
}
return 0;
}
So we have written a program in C and then compiled it. Now we want to look at the assembly code so see what code is actually going to be run by the machine.
objdump -D programName
This will give us som crazy output like this
00000000004004e6 <main>:
4004e6: 55 push %rbp
4004e7: 48 89 e5 mov %rsp,%rbp
4004ea: 48 83 ec 10 sub $0x10,%rsp
4004ee: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
4004f5: 00
4004f6: eb 0f jmp 400507 <main+0x21>
4004f8: bf a4 05 40 00 mov $0x4005a4,%edi
4004fd: e8 be fe ff ff callq 4003c0 <puts@plt>
400502: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
400507: 48 83 7d f8 09 cmpq $0x9,-0x8(%rbp)
40050c: 76 ea jbe 4004f8 <main+0x12>
40050e: b8 00 00 00 00 mov $0x0,%eax
400513: c9 leaveq
400514: c3 retq
400515: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40051c: 00 00 00
40051f: 90 nop
This is just the part about the main-function of the program. The output is a lot more, but that's not interesting to us at the time.
00000000004004e6
This number represents a place in memory. It is like an address. It could be written in base 10 if we wanted to. And it would still be the correct address. But out of the convenience describes about the address is written in hexadecimal. So the address is 16 digits. That is because the binary is a 64 bit addressing schema. So a 64-bit process can have 2^64
(1.84467441 × 10^19) memory addresses.
So on the first line after the main-line we see the number 55. All these numbers are actually machine-code, but instead of writing it in binary (01010101010101101) it is written in hexadecimal. The mnemonics to the right of those numbers are the instructions written in assembly. They are written so that we, humans, can understand it a bit easier. Instead of having to remember that 90 means nop. We just have to remember nop. So that's great. Makes it a lot easier to understand machine code.
So instead of having to memorize 10010000
it is represented as 90
in hexadecimal. And instead of having to remember 90
in hexadecimal we just have to remember nop
. Pretty great. But in the end they mean the same thing, they are just represented in three different ways.
There are basically two types of assembly language representation, it is the: AT&T syntax and the Intel syntax. The AT&T syntax is the default syntax in linux distributions. So when we run objdump, like the example above, it is in AT&T syntax. And we can tell that it is AT&T because it has all those $ and % signs. If you add the -M intel
to you objdump command you will see the output in Intel-syntax. But in the end, it doesn't really matter, tomato tomato.
We can set the syntax in gdb with the following command:
set dis intel
#or
set disassembly-flavor intel
Okay, so the processor in your computers has something called registers. Registers are like internal variables for your processor. They are predefined, in the sense that you can't create registers. They are already there. You can think of them as like micro-memories, or just variables. They are used by the processor to make stuff faster, instead of having to look up a specific place in the memory it has its own micro-memory. There are only 16 registers available on x86 processors. So it is not that much to remember. The names of the registers are a bit different between 64 bit processors and 32 bit. A 64bit processor can run 32 bit binaries, but 32 bit processors can't run 64 bit binaries. If you want to know what type a binary is you just type
file binaryName
These are the names for 32 bit registers. And they are divided into sub-groups.
General registers These registers are mainly used for like temporary memory for the processor.
EAX - Accumulator
EBX - Base
ECX - Counter
EDX - Data
Index and pointers ESI - Source index
EDI - Destination index
EBP - Base pointer - This one stores an address in its little micro-memory.
EIP - Instruction pointer. Like a child points his finger on each word it reads in a book, the instruction pointer is that finger. It always points to the current instruction the processor is reading. This is a an important pointer.
ESP - Stack pointer - This one also stores an address.
Segment registers CS
DS
ES
FS
GS
SS
Indicator EFLAGS
So let's take a look at them in a real program. Let's run the program above but this time with a debugger, the Gnu Debugger.
gdb -q ./myprogram
First we set a breakpoint with the command: break main
to stop the program right before the main-function is run. Then we type info registers
to see what we got with our registers.
Breakpoint 1, 0x00000000004004ea in main ()
(gdb) info registers
#### General registers
rax 0x4004e6 4195558
rbx 0x0 0
rcx 0x0 0
rdx 0x7fffffffe798 140737488349080
#### Index and pointers
rsi 0x7fffffffe788 140737488349064
rdi 0x1 1
rbp 0x7fffffffe6a0 0x7fffffffe6a0
rsp 0x7fffffffe6a0 0x7fffffffe6a0
r8 0x400590 4195728
r9 0x7ffff7dea6d0 140737351952080
r10 0x83e 2110
r11 0x7ffff7a57520 140737348203808
r12 0x4003f0 4195312
r13 0x7fffffffe780 140737488349056
r14 0x0 0
r15 0x0 0
rip 0x4004ea 0x4004ea <main+4>
#### Indicator
eflags 0x246 [ PF ZF IF ]
#### Segment registers
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
So hexadecimal is used as a way to represent binary out of convenience.
So assembly is written in the following form:
mnemonic destination,source
The mnemonics are instructions like: mov, push, sub The destination and source are registers, addresses in memories, or values.
mov rbp,rsp
So here we move the current value in rsp (stack pointer) to rbp (base pointer). This is pretty standard in the beginning of a program. We take the stack-pointer and say that it is equal to the base-pointer for now.
sub rsp,0x10
Here we read: Subtract 0x10 from rsp. So Stack-pointer register is now equal to what it was before minus 0x10.
add/inc ;add or increment
cmp ; is used to compare values.
jmp ; jump to a different part of the program.
For example
cmp QWORD PTR [rbp-0x8],0x9
jbe 4004f8 <main+0x12>
Here we are making a comparison. Compare rbp-0x8 ==? 0x9. And jbe stands for jump if below or equal. I am guessing that is the loop. Then we have an address: 4004f8 which is the address to the point in the program where the loop is initiated. So it makes a comparison and if it is false it jumps to the beginning of the loop.
Notes
Warning: Using access() to check if a user is authorized to, for example, open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be avoided. (In the example just described, a safer alternative would be to temporarily switch the process's effective user ID to the real ID and then call open(2).)