Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Global Variables BNinja 4.0 #401

Merged
merged 20 commits into from
Mar 28, 2024
Merged

Fix Global Variables BNinja 4.0 #401

merged 20 commits into from
Mar 28, 2024

Conversation

NeoQuix
Copy link
Collaborator

@NeoQuix NeoQuix commented Mar 19, 2024

Restored Branch.

@NeoQuix
Copy link
Collaborator Author

NeoQuix commented Mar 19, 2024

Instructions for reviewer:

  1. Don't panic. Most line changes/files are just tests being refactored/deleted.
  2. The primary changes are in the frontend (constants/globals.py) and backend (cexprgen/vardecl).

I would suggest by starting in pseudo/ and checking if the new types/fields make sense.
Sadly I can't (and will not) change Constant, therefore the ConstantComposition is a bit rough (violates .value type of Constant).
This class will later serve as a base for Array and Struct when we add struct support.

Afterwards check the frontend and the new globals.py.
It mostly follows the same logic as the old one, but is a bit cleaned up. (And again more hacks for BNinjas API).
At the end check the backend.

All tests utilizing Globals needed an updated, because I changed the order of GlobalVariable.__ini__() and the initial_value type. Almost all changes are just a wrapper with Constant as the value and/or parameter reordering.

The global regex tests are partially removed as discussed.

@NeoQuix
Copy link
Collaborator Author

NeoQuix commented Mar 19, 2024

Still to do/discuss:

  • Backend printing
  • Duplicate global variables: See tests/samples/other/app1.so test_case => Lifter lifts all variables correctly, but as the backend is deploying a set and our hash method is poorly chosen, it does not collect all variables and therefore prints only the .got entries and not the .data entries
  • '&' still needs to be handles (and tested) for constant.py (We lift a BNinja void reference as a void*, therefore we need to remove the '&'.)
  • add new tests for GlobalVariable

@NeoQuix NeoQuix requested a review from blattm March 19, 2024 09:14
@NeoQuix
Copy link
Collaborator Author

NeoQuix commented Mar 19, 2024

Found a few new things:

  • local variables with array type
  • C syntax sugar is annoying in the backend

@NeoQuix NeoQuix force-pushed the refactor_global_variables_2 branch from 5fc348b to b2e1c5b Compare March 19, 2024 21:03
decompiler/frontend/binaryninja/handlers/types.py Outdated Show resolved Hide resolved
decompiler/structures/pseudo/expressions.py Outdated Show resolved Hide resolved
decompiler/frontend/binaryninja/handlers/globals.py Outdated Show resolved Hide resolved
decompiler/frontend/binaryninja/handlers/globals.py Outdated Show resolved Hide resolved
decompiler/backend/variabledeclarations.py Outdated Show resolved Hide resolved
decompiler/structures/pseudo/expressions.py Outdated Show resolved Hide resolved
decompiler/frontend/binaryninja/handlers/globals.py Outdated Show resolved Hide resolved
}
self._lifted_globals: dict[int, GlobalVariable] = {}
self._view: Optional[BinaryView] = None
self._callers: list[int] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the callers list, do we still need the cache?
I think they could interfere in a sense:
What if we have a chain A -> B -> C -> A. When we lift A, each Global will have cache entries with a chain up to Symbol(A). When we then later want to generate code, the output will depend on the lifting order, which is weird.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_lifted_globals will have entries when the chain of global variables is already lifted (e.g. it goes from the most recent recursive call to the oldest one). => Can't be used to detect circles. Only needed when the same circle will be lifted twice (or more).

_callers will have entries when we lift the chain. E.g. When we lift A, we see that the adress of A is already part of the circle and stop there. => Needed to detect the circle in the first place

And yes when we lift A a second time, it will have the same chain as before.
Why would that be a problem for generating the code at the backend?
The backend will collect all variables but strip the ssa labels of them, therefore duplicates will not occur.
Or am I missing your point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that the cache is just for performance reasons. This should be explicitly noted in a comment.

My Point was, that when we lift A, it's lifted as A -> B -> C ->Symbol(A). When we next lift C, we get C -> Symbol(A) and not C -> A -> B -> Symbol(C). Therefore the length of these chains depends on the lifting order. But I'm not sure if this will negatively impact the output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct and the down side from _lifted_globals.
If we wanted to remove the problem completely we would need to step away from these chains of GlobalVariables in the first place.

E.g. only lift each initial_value as a reference and store all referenced GlobalVariables inside a datastructure.
Don't really know if it is worth it for the struggle, because literally no stage looks at the initial_value.
Maybe it's a good discussion point before we implement structs?

decompiler/frontend/binaryninja/handlers/globals.py Outdated Show resolved Hide resolved
@NeoQuix
Copy link
Collaborator Author

NeoQuix commented Mar 27, 2024

I moved a few methods between private and public (or not even part of the object) because other lifter handlers may need them (section functions).

@blattm blattm merged commit 43c19d9 into main Mar 28, 2024
1 check passed
@blattm blattm deleted the refactor_global_variables_2 branch March 28, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants