Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New analysis module for find_float_consts #78

Merged
merged 5 commits into from
Jan 25, 2025

Conversation

disinvite
Copy link
Collaborator

Moving the find_float_consts search into its own module. Same performance improvements from #70. (I'm working on the thunk piece separately.)

Instead of having all analysis on the binary jammed into PEImage, put the relevant stuff in its own module that calls functions on the image.

I think all the review comments are addressed from that previous PR, and we now have some rudimentary tests targeted to just this analysis function. We still have the TODO to use more sections than just .text and .rdata but we are better suited to add the new function when it is ready. (#75)

The module is just under analysis for now. Thoughts on x86-specific or PE-specific sub-modules?

As for the floating point instructions: this is hopefully now better explained in the code and comments. You can run this excerpt to see the candidate instructions:

from capstone import Cs, CS_ARCH_X86, CS_MODE_32
disassembler = Cs(CS_ARCH_X86, CS_MODE_32)

deadbeef = b"\xef\xbe\xad\xde"
for opcode in range(0xd8, 0xe0):
    for ext in range(256):
        code = bytearray([opcode, ext, *deadbeef])
        inst = next(disassembler.disasm(code, 0), None)
        if inst and "deadbeef" in inst.op_str:
            print(f"{opcode:02x} {ext:02x}   {inst.mnemonic:10} {inst.op_str}")
    print()

That's all of them that operate on a single address. We can ignore anything that stores a value since those pointers are not constants. We also don't care about things that set or load status registers on the FPU. I ignored integer instructions like fiadd for now but we may want them later. We actually wound up targeting fewer instructions than before.

"""Search the given binary blob for floating-point instructions that reference a pointer.
If the base addr is given, add it to the offset of the instruction to get an absolute address.
"""
for match in FLOAT_INSTRUCTION_RE.finditer(raw):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what python version is required, but I think you can run regular expressions on memoryviews.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Per PEP 688 the common type for bytes and memoryview is collections.abc.Buffer, so I used that here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. 3.12 only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: it works with both bytes and memoryview, we just can't use the preferred type hint until 3.12. For now it's just bytes which mypy accepts.

@disinvite disinvite requested review from madebr and jonschz January 23, 2025 21:26

# TODO: Should check all code and const data sections.
code_sections = (image.get_section_by_name(".text"),)
const_sections = (image.get_section_by_name(".rdata"),)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this check also test write-able data sections?
e.g. code that does;

float g_Gravity = 9.8f;
void set_gravity(float g) { g_Gravity = g; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not identify 9.8f if the value is in a writable section. If it were never modified (and in .rdata) then we would return it, but the correct behavior is to add the variable annotations first and then not replace g_Gravity with EntityType.FLOAT.

Copy link
Collaborator

@jonschz jonschz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readability has improved greatly, thanks! I took a brief look and couldn't find any regressions.

@disinvite disinvite merged commit 985b988 into isledecomp:master Jan 25, 2025
11 checks passed
@disinvite disinvite deleted the find-floats branch January 25, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants