Skip to content

Commit

Permalink
Update contributing guide
Browse files Browse the repository at this point in the history
  • Loading branch information
larziwau authored Dec 25, 2024
1 parent 7f4c106 commit d69a685
Showing 1 changed file with 44 additions and 29 deletions.
73 changes: 44 additions & 29 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,62 +2,77 @@

## Important Note

While we're thrilled that there is so much interest in reverse engineering LEGO Island and are happy to accept contributions from anyone who would like to help progress us further to our goal of a complete codebase, proposed changes to this repository must adhere to a certain degree of engineering quality. While the established contributors here are more than happy to provide code reviews and constructive criticism, it is not their job to teach potential contributors C++ or decompilation fundamentals. As a project that is largely an artifact of the free time of its contributors, the more of that (often scarce) resource that can be dedicated to efficient work, the faster the decompilation will progress. Unfortunately, this results in well-intentioned but poorly constructed contributions actually hurting progress in the long-term. While we are greatly appreciative of the sentiment, if you aren't very confident in your decompilation abilities, it is generally in the project's best interest that you return when you have a better grasp over the process.
We are thrilled about the interest in reverse engineering LEGO Island and welcome contributions from anyone eager to advance the project. However, proposed changes must meet a certain standard of engineering quality. While our established contributors are happy to provide code reviews and constructive feedback, they cannot teach contributors C++ or decompilation fundamentals.

Generally, decompilation is a fairly advanced skill. Depending on your current proficiency with C/C++ and x86 assembly, it could take you months or even years to learn the skills necessary to do it adequately. If you're still interested in learning, [part 1 of the decompilation vlog](https://www.youtube.com/watch?v=MToTEqoVv3I) covers the overall process and should give you a starting point that you can dive in from. Once again, please make yourself familiar with this process before attempting to contribute code to this project.
This project depends on contributors' free time, a resource often in short supply. Poorly constructed contributions, though well-intentioned, can hinder progress in the long term. If you're not confident in your decompilation abilities, we recommend returning once you have a stronger grasp of the process.

## Ghidra Server
Decompilation is an advanced skill. Depending on your proficiency in C/C++ and x86 assembly, it might take months or even years to develop the necessary skills. For beginners, [Part 1 of the decompilation vlog](https://youtu.be/MToTEqoVv3I) provides an excellent introduction. Please familiarize yourself with this process before contributing code.

For documenting the original binaries and generating pseudocode that we decompile with, we primarily use [Ghidra](https://ghidra-sre.org/) (it's free and open source). To help with collaboration, we have a shared Ghidra repository with all of our current work. You are free to check it out and mess around with it locally, however to prevent sabotage, you will need to request permission before you can push your changes back to the server (ask in the Matrix room).
## Ghidra Server

To access the Ghidra repository, use the following details:
We use [Ghidra](https://ghidra-sre.org/) (free and open source) to document the original binaries and generate pseudocode for decompilation. Collaboration is facilitated through a shared Ghidra repository. You are welcome to explore it locally, but push access requires permission to prevent sabotage. Contact us in the Matrix room to request access.

- Address: `server.mattkc.com`
- Port: `13100`
Repository details:
- **Address:** `server.mattkc.com`
- **Port:** `13100`

**Please note that at the time of writing, much of the information found on the Ghidra server is severely outdated**. Generally, the source code found in this repository represents the latest "source of truth" and should be referenced whenever possible.
> **Note:** Much of the information on the Ghidra server is outdated. The source code in this repository is the most accurate "source of truth" and should be referenced whenever possible.
## General Guidelines

If you feel fit to contribute, feel free to create a pull request! Someone will review and merge it (or provide feedback) as soon as possible.
If you're ready to contribute, feel free to create a pull request (PR). We will review and merge itor provide feedbackas soon as possible.

Please keep your pull requests small and understandable; you may be able to shoot ahead and make a lot of progress in a short amount of time, but this is a collaborative project, so you must allow others to catch up and follow along. Large pull requests become significantly more unwieldy to review, and as such make it exponentially more likely for a mistake or error to go undetected. They also make it harder to merge other pull requests because the more files you modify, the more likely it is for a merge conflict to occur. A general guideline is to keep submissions limited to one class at a time. Sometimes two or more classes may be too interlinked for this to be feasible, so this is not a hard rule, however if your PR is starting to modify more than 10 or so files, it's probably getting too big.
### Pull Request Tips:
- Keep PRs small and understandable to facilitate collaboration and reduce errors.
- Large PRs (modifying more than ~10 files) increase the likelihood of merge conflicts and errors.
- Aim to focus on one class per PR. Interlinked classes may require exceptions but proceed cautiously.

This repository currently has only one goal: accuracy to the original executables. We are byte/instruction matching as much as possible, which means the priority is making the original compiler (MSVC 4.20) produce code that matches the original game. As such, modernizations and bug fixes will probably be rejected for the time being.
### Project Goals:
This repository's sole objective is **accuracy to the original executables**.
- We prioritize byte/instruction matching the original compiler (MSVC 4.20).
- Modernizations or bug fixes will likely be rejected for now.

## Overview

* [`3rdparty`](/3rdparty): Contains code obtained from third parties, not including Mindscape. Generally, these are libraries that have been placed in the public domain or are freely available on the web. As these are unaltered files, our style guide (see below) does not apply.
* [`CONFIG`](/CONFIG): Decompilation of `CONFIG.EXE`. It depends on some code in `LEGO1`.
* [`ISLE`](/ISLE): Decompilation of `ISLE.EXE`. It depends on some code in `LEGO1`.
* [`LEGO1`](/LEGO1): Decompilation of `LEGO1.DLL`. This folder contains code from Mindscape's custom in-house engine called **Omni** (file pattern: `mx*`), the LEGO Island-specific extensions for Omni and the game's code (file pattern: `lego*`) as well as several utility libraries developed by Mindscape.
* [`tools`](/tools): A set of tools aiding in the decompilation effort.
* [`util`](/util): Utility headers aiding in the decompilation effort.
- **[`3rdparty`](/3rdparty):** Libraries from third parties (excluding Mindscape). These are public domain or freely available files. Our style guide does not apply here.
- **[`CONFIG`](/CONFIG):** Decompilation of `CONFIG.EXE`, dependent on `LEGO1` code.
- **[`ISLE`](/ISLE):** Decompilation of `ISLE.EXE`, dependent on `LEGO1` code.
- **[`LEGO1`](/LEGO1):** Decompilation of `LEGO1.DLL`, containing:
- **Omni:** Mindscape's custom in-house engine (`mx*` files).
- **LEGO-specific code:** Extensions and game-specific libraries (`lego*` files).
- **Utility libraries** developed by Mindscape.
- **[`tools`](/tools):** Aiding tools for decompilation.
- **[`util`](/util):** Utility headers supporting the effort.

## Tooling

Please make yourself familiar with the [available tooling and annotations](/tools/README.md). These are generally required to contribute to the project.
Refer to the [tooling and annotations guide](/tools/README.md). Familiarity with these tools is essential for contributing.

## Notes on MSVC 4.20

As outlined in the [`README`](/README.md), Microsoft Visual C++ 4.20 is the compiler we use to build the game.
As outlined in the [`README`](/README.md), we use Microsoft Visual C++ 4.20 to compile the game.

One important aspect to know about this compiler in the context of the decompilation project is that the assembly code generation is somewhat erratic. We call this peculiarity "compiler randomness" or entropy. In essence, what it comes down to is that changes to the code base, for instance in a header, can pseudo-randomly affect the code generation of functions in compilation units that include this header, even if the changes are completely unrelated to those functions. For example, by adding an extra (unused) inline function or an enum declaration in a header, the code in some functions may unexpectedly wind up looking different and our main tool, [`reccmp`](/tools/README.md), will report either a (significantly) reduced or increased accuracy for those functions. This issue roughly affects around ~5% of all decompiled functions.
### Compiler Randomness
- The compiler's code generation can behave erratically, introducing "compiler randomness" or entropy.
- Changes in headers, even unrelated ones (e.g., adding an unused inline function), may alter unrelated function outputs.
- This issue affects ~5% of decompiled functions, complicating efforts to achieve 100% matching binaries.

We are currently unaware of the exact nature of this phenomenon. Unfortunately it represents a significant obstacle in our effort to achieve 100% matching binaries. If you or anyone you know has knowledge about the compiler internals that lead to the described observations, please contact us.
If you have insights into this phenomenon, please contact us.

## Code Style

In general, we're not exhaustively strict about coding style, but there are some preferable guidelines to follow that have been adopted from what we know about the original codebase:

### Formatting
We use:
- [clang-format](https://clang.llvm.org/docs/ClangFormat.html) and [clang-tidy](https://clang.llvm.org/extra/clang-tidy/) with configuration files replicating the original formatting.
- Required `clang` version: `18.x`.

We are currently using [clang-format](https://clang.llvm.org/docs/ClangFormat.html) and [clang-tidy](https://clang.llvm.org/extra/clang-tidy/) with configuration files that aim to replicate the code formatting employed by the original developers. There are [integrations](https://clang.llvm.org/docs/ClangFormat.html#vim-integration) available for most editors and IDEs. The required `clang` toolchain version is `18.x`.

### Naming conventions

We are currently using a customized version of [ncc](https://github.com/nithinn/ncc) with a configuration file that aims to replicate the naming conventions employed by the original developers. `ncc` requires Clang `16.x`; please refer to the [tool](/tools/ncc) and the [GitHub action](/.github/workflows/naming.yml) for guidance.
### Naming Conventions
We use a customized version of [ncc](https://github.com/nithinn/ncc) to replicate original naming conventions.
- Required `clang` version: `16.x`.
- Refer to the [ncc tool guide](/tools/ncc) and [GitHub action](/.github/workflows/naming.yml) for details.

## Questions?

For any further questions, feel free to ask in either the [Matrix chatroom](https://matrix.to/#/#isledecomp:matrix.org) or on the [forum](https://forum.mattkc.com/viewforum.php?f=1).
For further questions, reach out via:
- [Matrix chatroom](https://matrix.to/#/#isledecomp:matrix.org)
- [Forum](https://forum.mattkc.com/viewforum.php?f=1)

0 comments on commit d69a685

Please sign in to comment.