Skip to content
John Källén edited this page Apr 19, 2017 · 4 revisions

Internally Reko uses its own Register Transfer Language (RTL) in all analyses it performs. It has no knowledge of processor specific machine instructions. It is the task of each processor architecture implementation to provide a suitable Rewriter that translates machine instructions to RTL.

RTL consists of distinct instructions and expressions.

RTL Instructions

The instructions are the following:

RtlAssignment: models an assignment. E.g. eax = eax + 1 Mem0[r1:byte] = 0x20

RtlBranch: models a conditional branch to an address. E.g. branch Test(NZ, eax) 00402344

RtlCall: models either a direct subroutine call (to a constant address) or an indirect call (to a computed expression: call 00401580 call Mem0[eax + 00000018:word32]

RtlGoto: models an unconditional direct or indirect branch (like the kind produced by switch statements): goto 00401890 goto Mem0[r1 + r2 * 4]

RtlIf: models a conditionally executed statement, present in some architectures: if (r1 > 0) r1 = 0

RtlReturn: models a return to the caller, including how many bytes are removed from a return stack (if applicable) return (4)

RtlSideEffect: models an instruction that has no observable effect on registers, e.g. the out instruction of the x86 architecture: __outb(edx,al)

RTL Expressions

All expressions modeled by the Decompiler have a [data type](data type). At the very least, the data type will be one of the neutral byte or word<XX> types, whose only attribute is their size in bits.

Base expressions constitute the leaf nodes of expression trees. There are three kinds of base expressions:

Constants: these model constant values, such as booleans, integers, characters or real numbers. Constants may be signed, unsigned (in the case of integers) false -1234 3e-3 'c' Later stages of the decompilation process may produce string constants, which also are modeled by constants.

Addresses are special constants that are known to be pointers to locations. Addresses are especially useful to Reko as it allows it to determine locations referred to by the program. Address must model byzantine addressing schemes such as the infamous x86 segmented addresses, consisting of a segment selector and an offset. 004079A0 0C00:1253

Identifiers model locations accessed by the program. The name of the identifiers are derived from register names, or synthesized from other values such as stack offsets: r1 dwLoc04 global_00403120 fn04001670

Expressions can be further composed by combinations of base expressions and the following:

Unary operators model negation, bit-wise complement, and other single-operand expressions: !cx &dwLoc04

Binary operators model arithmetic operations, logical operations, shift operations, and comparisons: dwLoc04 + 0x0004 r1 << 0x02 al >= '0'

Memory accesses model loads from and stores to memory. A special version of the instruction models Intel x86 segmented memory accesses: Mem0[fp - 0x12] = r10 ax = Mem1[es:bx + 0x04:word16]

Sequences model expressions that occupy consecutive ordered sequences of registers. Commonly used when register pairs are used to represent values that are too wide to fit in one register: dx:ax es:bx hi:lo The sequence construction operator SEQ is used to build sequences of other things than registers. For instance, the expression SEQ(Mem0[ds:bx + 0x0004:word16],Mem0[ds:bx + 0x0002:word16]) models a 32-bit sequence constructed by fetching two 16-bit words from memory in little-endian order.

Casts are used to coerce the data type of an expression to another. This construct is used to type conversion, model sign extension and truncation: (word16) eax (int32) 'a'

Applications model calls to functions: fn0124_0123(ecx)

The DPB function is derived from a function in Common Lisp that takes its name from a PDP-10 instruction, also called DPB (http://pdp10.nocrew.org/docs/instruction-set/Byte.html), which would deposit a byte inside of a larger word. Reko uses DPB to models how, on some architectures, a byte load is stored into an architectural register that is wider than a byte without modifying the remainder of the register. For instance, the m68k instruction:

move.b (a5),d3

will be "lifted" to the RTL:

tmp = Mem0[a5:byte]     // load a byte from memory
d3 = DPB(d3, tmp, 0)    // deposit the byte at offset 0 of d3.
Clone this wiki locally