-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New analysis module for find_float_consts
#78
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
bce7cf5
Floating point analysis module
disinvite 2847c91
Remove find_float_consts from PE class
disinvite 195bfc8
Use collections.abc.Buffer, make pe.relocations public
disinvite e6786e9
Remove collections abc.Buffer for pre 3.12
disinvite 7e8a6a9
Should ignore float variable
disinvite File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .float_const import find_float_consts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
"""Analysis related to x86 floating point instructions. | ||
All floating point instructions use two byte opcodes. The first byte is in the range D8 to DF. | ||
The second indicates the operation and pointer or registers used. | ||
|
||
We are interested in floating point constants, so we want to exclude instructions that: | ||
- access the status register or environment (FLDCW, FLDENV) | ||
- store a value (FST, FSTP) | ||
- refer to integers (FI*) | ||
|
||
Then filter on pointers into read-only sections. | ||
""" | ||
import re | ||
import struct | ||
from typing import Iterator, NamedTuple | ||
from reccmp.isledecomp.formats import PEImage | ||
|
||
SINGLE_PRECISION_OPCODES = frozenset( | ||
[ | ||
(0xD8, 0x05), # fadd | ||
(0xD8, 0x0D), # fmul | ||
(0xD8, 0x15), # fcom | ||
(0xD8, 0x1D), # fcomp | ||
(0xD8, 0x25), # fsub | ||
(0xD8, 0x2D), # fsubr | ||
(0xD8, 0x35), # fdiv | ||
(0xD8, 0x3D), # fdivr | ||
(0xD9, 0x05), # fld | ||
] | ||
) | ||
|
||
DOUBLE_PRECISION_OPCODES = frozenset( | ||
[ | ||
(0xDC, 0x05), # fadd | ||
(0xDC, 0x0D), # fmul | ||
(0xDC, 0x15), # fcom | ||
(0xDC, 0x1D), # fcomp | ||
(0xDC, 0x25), # fsub | ||
(0xDC, 0x2D), # fsubr | ||
(0xDC, 0x35), # fdiv | ||
(0xDC, 0x3D), # fdivr | ||
(0xDD, 0x05), # fld | ||
] | ||
) | ||
|
||
FLOAT_OPCODES = frozenset([*SINGLE_PRECISION_OPCODES, *DOUBLE_PRECISION_OPCODES]) | ||
|
||
|
||
# Match a superset of the floating point instructions above. | ||
# Uses positive lookahead to support overlapping matches. | ||
FLOAT_INSTRUCTION_RE = re.compile( | ||
rb"(?=([\xd8\xd9\xdc\xdd][\x05\x0d\x15\x1d\x25\x2d\x35\x3d].{4}))", flags=re.S | ||
) | ||
|
||
|
||
class FloatInstruction(NamedTuple): | ||
# The address (or offset) of the instruction | ||
address: int | ||
# Two byte opcode of the instruction | ||
opcode: tuple[int, int] | ||
# The address used in the operand | ||
pointer: int | ||
|
||
|
||
def find_float_instructions_in_buffer( | ||
buf: bytes, base_addr: int = 0 | ||
) -> Iterator[FloatInstruction]: | ||
"""Search the given binary blob for floating-point instructions that reference a pointer. | ||
If the base addr is given, add it to the offset of the instruction to get an absolute address. | ||
TODO: Uses `bytes` as the generic type for the Buffer protocol. See PEP 688 added in Python 3.12. | ||
""" | ||
for match in FLOAT_INSTRUCTION_RE.finditer(buf): | ||
inst = match.group(1) | ||
opcode = (inst[0], inst[1]) | ||
|
||
if opcode in FLOAT_OPCODES: | ||
(pointer,) = struct.unpack("<I", inst[2:6]) | ||
yield FloatInstruction(base_addr + match.start(), opcode, pointer) | ||
|
||
|
||
class FloatConstant(NamedTuple): | ||
address: int | ||
size: int | ||
value: float | ||
|
||
|
||
def find_float_consts(image: PEImage) -> Iterator[FloatConstant]: | ||
"""Floating point instructions that refer to a memory address can | ||
point to constant values. Search the code sections to find FP | ||
instructions and check whether the pointer address refers to | ||
read-only data.""" | ||
|
||
# Multiple instructions can refer to the same float. | ||
# Return each float only once from this function. | ||
seen = set() | ||
|
||
# TODO: Should check all code and const data sections. | ||
code_sections = (image.get_section_by_name(".text"),) | ||
const_sections = (image.get_section_by_name(".rdata"),) | ||
|
||
for sect in code_sections: | ||
for inst in find_float_instructions_in_buffer(sect.view, sect.virtual_address): | ||
if inst.pointer in seen: | ||
continue | ||
|
||
seen.add(inst.pointer) | ||
|
||
# Make sure that the address of the operand is a relocation. | ||
if inst.address + 2 not in image.relocations: | ||
continue | ||
|
||
# Ignore instructions that point to variables | ||
if any( | ||
const_sect.contains_vaddr(inst.pointer) for const_sect in const_sections | ||
): | ||
if inst.opcode in SINGLE_PRECISION_OPCODES: | ||
# dword ptr -- single precision | ||
(float_value,) = struct.unpack("<f", image.read(inst.pointer, 4)) | ||
yield FloatConstant(inst.pointer, 4, float_value) | ||
|
||
elif inst.opcode in DOUBLE_PRECISION_OPCODES: | ||
# qword ptr -- double precision | ||
(float_value,) = struct.unpack("<d", image.read(inst.pointer, 8)) | ||
yield FloatConstant(inst.pointer, 8, float_value) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
"""Test find_float_const for PE images""" | ||
|
||
from reccmp.isledecomp.formats import PEImage | ||
from reccmp.isledecomp.analysis.float_const import ( | ||
find_float_instructions_in_buffer, | ||
find_float_consts, | ||
) | ||
|
||
|
||
def test_float_detect_overlap(): | ||
"""Must be able to match potential instructions that overlap. | ||
Because we are not disassembling, we don't know whether a given | ||
byte is the start of an instruction.""" | ||
code = b"\xd8\x05\xd8\x05\x00\x10\x00\x10" | ||
floats = list(find_float_instructions_in_buffer(code)) | ||
assert len(floats) == 2 | ||
|
||
|
||
def test_basic_float_detection(binfile: PEImage): | ||
"""Make sure we detect some known floats in our sample PE image""" | ||
floats = list(find_float_consts(binfile)) | ||
|
||
# Single and double precision, same value | ||
assert (0x100DBD38, 4, 0.5) in floats | ||
assert (0x100D8BC0, 8, 0.5) in floats | ||
|
||
# Integer | ||
assert (0x100D6F88, 4, 1024.0) in floats | ||
|
||
# Both pi, both doubles, but different levels of precision | ||
assert (0x100DB8F0, 8, 3.141592653589793) in floats | ||
assert (0x100DBD50, 8, 3.14159265359) in floats | ||
|
||
# Ignore float variable from .data | ||
assert (0x100F7500, 4, 0.1) not in floats | ||
|
||
|
||
def test_floats_appear_once(binfile: PEImage): | ||
"""Multiple instructions may point at the same constant. | ||
Our list should only return each constant once.""" | ||
floats = list(find_float_consts(binfile)) | ||
|
||
assert len(floats) == len(set(floats)) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this check also test write-able data sections?
e.g. code that does;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not identify
9.8f
if the value is in a writable section. If it were never modified (and in.rdata
) then we would return it, but the correct behavior is to add the variable annotations first and then not replaceg_Gravity
withEntityType.FLOAT
.