Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce reccmp project file concept #1

Merged
merged 11 commits into from
Oct 24, 2024

Conversation

madebr
Copy link
Collaborator

@madebr madebr commented Oct 19, 2024

There are 3 kind of project files:

  • reccmp-project.yml: should be part of the git repo
    e.g.
    targets:
      CONFIG:
        filename: CONFIG.EXE
        source-root: CONFIG
        hash:
          sha256: 864766d024d78330fed5e1f6efb2faf815f1b1c3405713a9718059dc9a54e52c
      ISLE:
        filename: ISLE.EXE
        source-root: ISLE
        hash:
          sha256: 5cf57c284973fce9d14f5677a2e4435fd989c5e938970764d00c8932ed5128ca
      LEGO1:
        filename: LEGO1.DLL
        source-root: LEGO1
        hash:
          sha256: 14645225bbe81212e9bc1919cd8a692b81b8622abb6561280d99b0fc4151ce17
    It contains general information, is part of the repo and can never contain absolute paths.
  • reccmp-user.yml: should not be committed to the repo
    e.g.
    targets:
      ISLE: # abc
        path: /home/maarten/games/isle/ISLE.EXE
      LEGO1:
        path: /home/maarten/games/isle/LEGO1.DLL
      CONFIG:
        path: /home/maarten/games/isle/CONFIG.EXE
    It contains all information that is user specific, and should remain unchanged for the duration of a decompilation project (unless you move binaries around)
  • reccmp-build.yml: should not be committed to the repo
    e.g.
    project: /home/maarten/projects/isle  # directory of reccmp-project.yml
    targets:
      ISLE: # abc
        path: /home/maarten/projects/isle/build/ISLE.EXE
        pdb: /home/maarten/projects/isle/build/ISLE.pdb
      LEGO1:
        path: /home/maarten/games/isle/LEGO1.DLL
        pdb: /home/maarten/games/isle/LEGO1.pdb
      CONFIG:
        path: /home/maarten/games/isle/CONFIG.EXE
        pdb: /home/maarten/games/isle/CONFIG.pdb
    This file contains paths that can change from build to build.

reccmp-project and reccmp-user.yml are expected to be in the same directory. reccmp-build.yml can be in the build directory, and contains a reference to the project root.

The reccmp-project script can generate a new reccmp project with the command reccmp-project create ...
The reccmp-project script can create project-user.yml and project-build.yml with the command reccmp-project detect ...

I modified the various reccmp-* utilities such that it will look for reccmp-build.yml in the working directory.
It uses a similar search procedure as git uses for finding a .git folder: walk the tree until it find either a reccmp-build.yml file, or a reccmp-project.yml file, depending on whether it needs recompiled binaries.

There is also cmake support to automatically generate a reccmp-build.yml file. The reccmp.cmake file should be committed to foreign recompilation projects.

I modified the isle project to make use of the reccmp scripts in this branch on my fork.
I made it a git submodule in order to be able to run the unittests, that depend on isle executables.
Ideally, the isle project should just need to install git+https://github.com/isledecomp/reccmp.git.

Copy link
Member

@foxtacles foxtacles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just some minor issue in the CI

@disinvite
Copy link
Collaborator

Just starting to play with this now. I like the YAML config setup because it nicely solves the problem of "where do you put the files?" that (I assume) we've all had to figure out in our own way.

I'm less sure about the sample code files and cmake stuff. A new recomp project might not use C++ or might be a DOS program that would require a specific compiler under emulation. Any way to disable that from the reccmp-project options or have it as its own script?

@jonschz
Copy link
Collaborator

jonschz commented Oct 19, 2024

I can't promise a detailed review before tomorrow, but the approach looks good so far! I'll try and see how well the Ghidra import works.

@madebr madebr force-pushed the project-structure branch 2 times, most recently from 124fdb9 to 1fbc4a3 Compare October 19, 2024 10:50
@madebr madebr force-pushed the project-structure branch from 1fbc4a3 to 63695fa Compare October 19, 2024 11:25
@madebr
Copy link
Collaborator Author

madebr commented Oct 19, 2024

I'm less sure about the sample code files and cmake stuff. A new recomp project might not use C++ or might be a DOS program that would require a specific compiler under emulation. Any way to disable that from the reccmp-project options or have it as its own script?

I added a --cmake-project option to the reccmp-project create command.

Speaking about DOS. I added a targets.<target-id>.pdb key to reccmp-build.yml. Perhaps this is a bad name, and should be something else? pdb is MSVC-specific. Watcom provides .sym files.
Perhaps rename it to debug-path? This would future proof us for elf support where dwarf information can be either part of the original executable, or split to an external file.

Rename to targets.<target>.debug-path? or targets.<target>.symbols-path?

reccmp/tools/project.py Outdated Show resolved Hide resolved
reccmp/tools/project.py Outdated Show resolved Hide resolved
@MrSapps
Copy link

MrSapps commented Oct 19, 2024

Sorry for the spam - didn't want to open a pointless issue but would this PR make this tool usable for other projects. I'd like to integrate it into https://github.com/CriminalRETeam/gta2_re/

reccmp/tools/project.py Outdated Show resolved Hide resolved
reccmp/tools/project.py Outdated Show resolved Hide resolved
reccmp/tools/project.py Outdated Show resolved Hide resolved
reccmp/tools/project.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jonschz jonschz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Here are some remarks on what I found:

  • README:

    • It'd be great if you could add the descriptions you made in this PR to this repo's README, (either in the main README.md or in another referenced file). I'm also happy with doing that in a separate PR if you feel like the interface is still too unstable.
    • Make sure no README gets lost (for example: I can't find the info on recompiling BETA10 here or on your branch). I think we'll double check that anyway once we migrate isle, just wanted to raise the awareness.
  • I encountered some issues when I tried to set up BETA10 (see e.g. here):

    • I added the following to reccmp-project.yml:
        BETA10:
          filename: BETA10.DLL
          source-root: LEGO1
    • Neither auto-generated reccmp-build.yml (in build or build_debug) contains an entry for BETA10. The manual fix is to add
        BETA10:
          path: '[...]/build_debug/LEGO1.DLL'
          pdb: '[...]/build_debug/LEGO1.DLL'
      to build/reccmp-build.yml. build_debug/reccmp-build.yml is obsolete and can be removed. I am not sure if this workflow can be improved significantly, partly because the build artifact for BETA10 is also called LEGO1.DLL.
    • This is an actual bug: With BETA10 set up, I get hundreds of false positive duplicate_offset warnings in isledecomp, likely because of the shared codebase between BETA10 and LEGO1.
  • Others:

    • I assume it's by design that you can only run scripts like reccmp.tools.reccmp from the build folder since it doesn't find the reccmp-build.yml otherwise? I don't see an easy way around this right now, just wanted to make sure that it's intentional.
    • The Ghidra import does not work correctly. I'll open a ticket here to find a solution once this PR is merged. IMHO this should be fixed before we migrate isle -> Fix the Ghidra import #2
    • General question: What is your personal intended workflow if you want to work on the LEGO1 decompile and reccmp at the same time? Install them next to each other and do a pip install -e?

reccmp/tools/datacmp.py Show resolved Hide resolved
reccmp/tools/roadmap.py Outdated Show resolved Hide resolved
reccmp/tools/stackcmp.py Show resolved Hide resolved
reccmp/tools/project.py Show resolved Hide resolved
reccmp/project/detect.py Outdated Show resolved Hide resolved
@jonschz jonschz mentioned this pull request Oct 20, 2024
3 tasks
@disinvite
Copy link
Collaborator

disinvite commented Oct 20, 2024

  • This is an actual bug: With BETA10 set up, I get hundreds of false positive duplicate_offset warnings in isledecomp, likely because of the shared codebase between BETA10 and LEGO1.

On this point: should decomplint read (or require) the reccmp project target from the command line? We currently use the --module argument to decide which annotations to focus on, but we could drop that and use the target name instead.

Another option is to use the base filename of the original binary. This is what the main reccmp script does.

@jonschz
Copy link
Collaborator

jonschz commented Oct 20, 2024

On this point: should decomplint read (or require) the reccmp project target from the command line?

Given the extremely low runtime of decomplint compared to all other scripts, I think it's fine for it to run on all targets. I'd try to minimise the effort at the moment since there is a lot to be done. If we ever want to change the behaviour of decomplint, I see some merit in having a uniform interface for all our scripts.

@disinvite
Copy link
Collaborator

On this point: should decomplint read (or require) the reccmp project target from the command line?

Given the extremely low runtime of decomplint compared to all other scripts, I think it's fine for it to run on all targets. I'd try to minimise the effort at the moment since there is a lot to be done. If we ever want to change the behaviour of decomplint, I see some merit in having a uniform interface for all our scripts.

I agree, but we should solve the duplicate offset errors. The cause is here:

args.paths = list(target.source_root for target in project.targets.values())

If no paths are specified at the command line, we combine all the source roots from the reccmp-project file. If your yaml looks like this then the files in the LEGO1 directory will appear twice:

targets:
  beta10:
    filename: BETA10.DLL
    source-root: LEGO1
  lego1:
    filename: LEGO1.DLL
    source-root: LEGO1

If we isolate based on the target name (to match the other scripts) then this won't happen.

@madebr
Copy link
Collaborator Author

madebr commented Oct 24, 2024

If no paths are specified at the command line, we combine all the source roots from the reccmp-project file. If your yaml looks like this then the files in the LEGO1 directory will appear twice:

I fixed this by going through a python set:

list(set(target.source_root for target in project.targets))

@jonschz
Copy link
Collaborator

jonschz commented Oct 24, 2024

I fixed this by going through a python set:

I can confirm that the issue is gone, thank you!

@jonschz jonschz mentioned this pull request Oct 24, 2024
2 tasks
@jonschz jonschz merged commit 6256cb0 into isledecomp:master Oct 24, 2024
4 checks passed
@madebr madebr deleted the project-structure branch October 24, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants