Skip to content

Commit

Permalink
Merge branch 'main' into feat/hu_ged_eval
Browse files Browse the repository at this point in the history
  • Loading branch information
mahaloz authored Nov 14, 2024
2 parents 1e7e4f3 + ae5069a commit 56eb298
Show file tree
Hide file tree
Showing 170 changed files with 249 additions and 1,071,336 deletions.
10 changes: 0 additions & 10 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,6 @@ RUN git clone https://github.com/angr/angr-dev.git && ( \
cd ./angr-dev && \
printf "I know this is a bad idea.\n" | ./setup.sh -i)


# XXX: checkout angr to version at the submission of SAILR
# if you want the latest version of angr-decompiler, with future fixes, remove this block
RUN pip3 install sh && (cd ./angr-dev && \
./admin/checkout_at.py a8bab649cfc18912d5bb3ce70ef57a4ae4039f53 && \
cd ./angr && \
git remote add mahaloz https://github.com/mahaloz/angr-sailr.git && \
git fetch mahaloz && \
git checkout mahaloz/sailr)

# ===========================================================
# Ghidra 10.1
# ===========================================================
Expand Down
151 changes: 25 additions & 126 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# SAILR Evaluation Pipeline

<p align="center">
<img src="https://i.imgur.com/VUnGRHU.png" style="width: 30%;" alt="angr-sailr Logo"/>
</p>

The SAILR evaluation pipeline, `sailreval`, is a tool for measuring various aspects of decompilation quality.
This evaluation pipeline was originally developed for the USENIX 2024 paper ["Ahoy SAILR! There is No Need to DREAM of C:
A Compiler-Aware Structuring Algorithm for Binary Decompilation"](https://www.zionbasque.com/files/publications/sailr_usenix24.pdf). It supports 26 different C packages from Debian,
for compiling, decompiling, and measuring. Currently, angr, Hex-Rays (IDA Pro), and Ghidra are supported as decompilers.

If you are only looking to use the SAILR version of angr, then jump to the [using SAILR on angr](#using-sailr-on-angr-decompiler) section.
If you are only looking to use the SAILR version of angr, simply use angr! The latest version of angr now uses SAILR!
If you are looking to reproduce the exact results of the SAILR paper, then jump to [this README](./misc/reproducing_sailr_paper/README.md) for a submission version.

## Table of Contents
- [Overview](#overview)
Expand All @@ -14,12 +20,10 @@ If you are only looking to use the SAILR version of angr, then jump to the [usin
- [Decompilation](#decompiling)
- [Measuring](#measuring)
- [Aggregation](#aggregating)
- [SAILR Paper Artifacts](#sailr-paper-artifacts)
- [Using SAILR on angr decompiler](#using-sailr-on-angr-decompiler)
- [SAILR evaluation results files](#sailr-evaluation-results-files)
- [Reproducing SAILR paper results](#reproducing-sailr-paper-results)
- [Example Run](#example-run)
- [Miscellaneous](#miscellaneous)
- [Compiling Windows Targets](#compiling-windows-targets)
- [Citation](#citation)


## Overview:
Expand Down Expand Up @@ -57,7 +61,7 @@ our [CI runner](./.github/workflows/python-app.yml).
pip3 insatll -e .
```

Note: you will need to install the system dependencies for the Python project yourself, listed [here]([CI runner](./.github/workflows/python-app.yml).
Note: you will need to install the system dependencies for the Python project yourself, listed [here](./.github/workflows/python-app.yml).
The package is also available on PyPi, so remote installation works as well.

### Install Verification
Expand Down Expand Up @@ -231,132 +235,16 @@ You can normalize across both projects for binaries and functions that only exis
./eval.py --merge-results ./results/O2/coreutils*/sailr_measured --use-dec source angr_sailr --use-metric gotos cfged
```

## SAILR Paper Artifacts
The [SAILR paper](https://www.zionbasque.com/files/publications/sailr_usenix24.pdf) introduced four artifacts:
1. The [angr decompiler](https://github.com/angr/angr/tree/master/angr/analyses/decompiler), found in the [angr](https://github.com/angr/angr) repo.
2. The [SAILR](https://github.com/mahaloz/angr-sailr/tree/sailr/angr/analyses/decompiler/optimization_passes) algorithm,
built on the angr decompiler as optimization passes.
3. The SAILR evaluation pipeline, found in the `sailreval` Python package
4. The results of `sailreval` for the paper (tomls and decompilation outputs)

Below you will find instructions for using each of these artifacts.

### Using SAILR on angr decompiler
Currently, SAILR is being slowly [integrated into the angr master branch](https://github.com/angr/angr/issues/4229).
Until then, you can use the [angr-sailr](https://github.com/mahaloz/angr-sailr/tree/be3855762a84983137696aa14efe2431a86a7e97)
fork of angr inside our provided stripped down Dockerfile found in [misc/angr_sailr_dec/Dockerfile](./misc/angr_sailr_dec/Dockerfile).
You can also use the pre-built docker image found on [Dockerhub](https://hub.docker.com/r/mahaloz/angr-sailr-dec) (~3.5gb).
Note, this fork will not receive updates and is the exact version used in the paper.
The commit is [be3855762a84983137696aa14efe2431a86a7e97](https://github.com/mahaloz/angr-sailr/tree/be3855762a84983137696aa14efe2431a86a7e97).

To build the decompiler docker image, run, from the root of the repo:
```bash
docker build -t angr-sailr-dec -f misc/angr_sailr_dec/Dockerfile .
```

You could now run the image with `docker run --rm -it angr-sailr-dec`, but we recommend using the wrapper script.
You can use the wrapper script that will run this image for you:
```bash
./scripts/docker-angr-sailr-dec.sh --help
```

**NOTE**: this mounts the current directory into the container so the decompiler can access the binary.


Verify your version works by running it on the `motivating_example` binary in the root of the repo:
```bash
./scripts/docker-angr-sailr-dec.sh ./tests/binaries/motivating_example schedule_job --structuring-algo sailr
```

If working correctly, you should see the following output, which matches the papers example:
```c
long long schedule_job(unsigned long a0, unsigned long long a1, unsigned long a2) {
if (a0 && a1) {
complete_job();
if (EARLY_EXIT == a2)
goto LABEL_40126b;
next_job();
}
refresh_jobs();
if (a1 || a1)
fast_unlock();
LABEL_40126b:
complete_job();
log_workers();
return job_status(a1);
}
```
### SAILR evaluation results files
In the SAILR paper, we run an evaluation on all 26 packages in the [targets](./targets) directory.
We generated data for optimization levels `O0`, `O1`, `O2`, and `O3`.
We also recorded the decompilation on all these targets with the following decompilers:
`SAILR`, `IDA Pro 8.0`, `Ghidra 10.1`, `Phoenix`, `DREAM`, and `rev.ng`. In each `sailr_decompiled` folder,
you will find files likes so `<decompiler>_<binary_name>.c`. For example, `sailr_decompiled/ida_mv.c` is the decompilation
of `mv` from coreutils with IDA Pro 8.0. angr based decompilation starts with `angr_` and then the structuring algorithm.
All the files, which are about 11gbs in total, can be downloaded from [this Dropbox link](https://www.dropbox.com/scl/fi/ez5ra4yzxrynio7opxquo/results.tar.gz?rlkey=vi5ntdw48a9ohfnd0x8p32ael&dl=0).
After downloading, you can extract the files with:
```bash
tar xf results.tar.gz --use-compress-program=pigz
```

The will looks like the following (but with more files):
```tree
results/
├── O0
├── O1
├── O2
│ └── coreutils
│ ├── sailr_compiled
│ ├── sailr_decompiled
│ └── sailr_measured
└── O3
```

To further understand what is contained in the `sailr_*` folders, see the [usage](#usage) section above.
You can now use the `sailreval` package to aggregate the results like so to get the results from the paper:
```bash
./eval.py --summarize-targets ./eval.py --measure bash libselinux shadow libedit base-passwd openssh-portable \
dpkg dash grep diffutils findutils gnutls iproute2 gzip sysvinit bzip2 libacl libexpat libbsd tar rsyslog \
cronie zlib e2fsprogs coreutils \
--use-dec source ida ghidra angr_sailr angr_phoenix angr_dream angr_comb \
--use-metric gotos cfged bools func_calls \
--opt-levels O0 O1 O2 O3 \
--show-stats
```

### Reproducing SAILR paper results
We ran the entire pipeline of SAILR on an Ubuntu 22.04 machine that had 40 logical cores and 64 GB of RAM.
With these specs, it took about 8 hours to run the entire pipline for all 26 packages on the O2 optimization level.
If you intend to reproduce the results as they were in the paper, checkout this repo to commit [8442959e99c9d386c2cdfaf11346bf0f56e959eb](https://github.com/mahaloz/sailr-eval/commit/8442959e99c9d386c2cdfaf11346bf0f56e959eb),
which was the last version with minor fixes to the pipeline, but not edits to CFGED.
If you plan on evaluating modernly, use the latest commit, since it will have stability, speed, and other fixes to components of SAILR.

Due to slowness in processing of source with Joern, we recommend running the Joern stage **LOCALLY** and not in the
container. Here is an example run of only coreutils:
```bash
## Example Run
Here is an example run of the pipeline:
```sh
./docker-eval.sh --compile coreutils --cores 20 && \
./eval.py --decompile coreutils --use-dec source --cores 20 && \
./docker-eval.sh --decompile coreutils --use-dec ghidra angr_sailr angr_phoenix angr_dream angr_comb --cores 20 && \
./eval.py --measure coreutils --use-metric gotos cfged bools func_calls --use-dec source ghidra angr_sailr angr_phoenix angr_dream angr_comb --cores 20 && \
./eval.py --summarize-targets coreutils --use-dec source ghidra angr_sailr angr_phoenix angr_dream angr_comb --use-metric gotos cfged bools func_calls --show-stats
```
Take note of when `eval.py` is run instead of the docker version.

To reproduce the results from the paper, you run the following evaluation scripts that will run the entire pipeline for you:
```bash
./scripts/evaluation/all_packages_o2_table3.sh
./scripts/evaluation/coreutils_o2_table4.sh
```

Run them one at a time to observe their output.

Note, you will likely **not** get the exact numbers shown in the paper, but the final conclusions on the numbers (i.e. the relative distance of each score) should be the same.
This is due to a fundamental limitation in CFGED, which relies on GED to compute the edit distance between two CFGs.
Since we never know if GED will conclude, we must use a timeout, which can be affected by the machine you run on.
However, for most cases the timeout should not be triggered.

## Miscellaneous
### Compiling Windows Targets
Expand All @@ -370,5 +258,16 @@ Follow the following the steps to compile a windows target:
6. Rename the `*.obj` to `*.o`
7. If step `5` failed, then just remove the preprocessor option after running once

To run the full pipeline for Widnows targets, you must have [llvm-pdbutil](https://github.com/shaharv/llvm-pdbutil-builds)
To run the full pipeline for Windows targets, you must have [llvm-pdbutil](https://github.com/shaharv/llvm-pdbutil-builds)
installed on the system.

## Citation
If you use this tool in your research, please cite out paper:
```bib
@inproceedings{basque2024ahoy,
title={Ahoy sailr! there is no need to dream of c: A compiler-aware structuring algorithm for binary decompilation},
author={Basque, Zion Leonahenahe and Bajaj, Ati Priya and Gibbs, Wil and O’Kain, Jude and Miao, Derron and Bao, Tiffany and Doup{\'e}, Adam and Shoshitaishvili, Yan and Wang, Ruoyu},
booktitle={Proceedings of the USENIX Security Symposium},
year={2024}
}
```
Binary file added misc/angr_sailr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 133 additions & 0 deletions misc/reproducing_sailr_paper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# SAILR Paper Artifacts
## Background
This README provides instructions for using and reproducing the results of the SAILR paper.
This will give you instructions on how to use the old **submission version of angr with SAILR**.
However, if you are looking to evaluate against SAILR, we highly recommend using the latest version of angr.

The [SAILR paper](https://www.zionbasque.com/files/publications/sailr_usenix24.pdf) introduced four artifacts:
1. The [angr decompiler](https://github.com/angr/angr/tree/master/angr/analyses/decompiler), found in the [angr](https://github.com/angr/angr) repo.
2. The [SAILR](https://github.com/angr/angr/tree/master/angr/analyses/decompiler/optimization_passes) algorithm,
built on the angr decompiler as optimization passes (described more in [#14](https://github.com/mahaloz/sailr-eval/issues/14)).
3. The SAILR evaluation pipeline, found in the `sailreval` Python package
4. The results of `sailreval` for the paper (tomls and decompilation outputs)

Below you will find instructions for using each of these artifacts.

## Using SAILR on angr decompiler
SAILR is currently [in the latest version of angr](https://github.com/angr/angr/issues/4229), which includes bug fixes
and improvements over the version used in the paper.
However, if you would like the submission version, you can use the [angr-sailr](https://github.com/mahaloz/angr-sailr/tree/be3855762a84983137696aa14efe2431a86a7e97)
fork of angr inside our provided stripped down Dockerfile found in [angr_sailr_dec/Dockerfile](./angr_sailr_dec/Dockerfile).
You can also use the pre-built docker image found on [Dockerhub](https://hub.docker.com/r/mahaloz/angr-sailr-dec) (~3.5gb).
Note, this fork will not receive updates and is the exact version used in the paper.
The commit is [be3855762a84983137696aa14efe2431a86a7e97](https://github.com/mahaloz/angr-sailr/tree/be3855762a84983137696aa14efe2431a86a7e97).

To build the decompiler docker image, run, from the root of the repo:
```bash
docker build -t angr-sailr-dec -f ./angr_sailr_dec/Dockerfile .
```

You could now run the image with `docker run --rm -it angr-sailr-dec`, but we recommend using the wrapper script.
You can use the wrapper script that will run this image for you:
```bash
./docker-angr-sailr-dec.sh --help
```

**NOTE**: this mounts the current directory into the container so the decompiler can access the binary.


Verify your version works by running it on the `motivating_example` binary in the root of the repo:
```bash
./docker-angr-sailr-dec.sh ./tests/binaries/motivating_example schedule_job --structuring-algo sailr
```

If working correctly, you should see the following output, which matches the papers example:
```c
long long schedule_job(unsigned long a0, unsigned long long a1, unsigned long a2) {
if (a0 && a1) {
complete_job();
if (EARLY_EXIT == a2)
goto LABEL_40126b;
next_job();
}
refresh_jobs();
if (a1 || a1)
fast_unlock();
LABEL_40126b:
complete_job();
log_workers();
return job_status(a1);
}
```
## SAILR evaluation results files
In the SAILR paper, we run an evaluation on all 26 packages in the [targets](./targets) directory.
We generated data for optimization levels `O0`, `O1`, `O2`, and `O3`.
We also recorded the decompilation on all these targets with the following decompilers:
`SAILR`, `IDA Pro 8.0`, `Ghidra 10.1`, `Phoenix`, `DREAM`, and `rev.ng`. In each `sailr_decompiled` folder,
you will find files likes so `<decompiler>_<binary_name>.c`. For example, `sailr_decompiled/ida_mv.c` is the decompilation
of `mv` from coreutils with IDA Pro 8.0. angr based decompilation starts with `angr_` and then the structuring algorithm.
All the files, which are about 11gbs in total, can be downloaded from [this Dropbox link](https://www.dropbox.com/scl/fi/ez5ra4yzxrynio7opxquo/results.tar.gz?rlkey=vi5ntdw48a9ohfnd0x8p32ael&dl=0).
After downloading, you can extract the files with:
```bash
tar xf results.tar.gz --use-compress-program=pigz
```

The will looks like the following (but with more files):
```tree
results/
├── O0
├── O1
├── O2
│ └── coreutils
│ ├── sailr_compiled
│ ├── sailr_decompiled
│ └── sailr_measured
└── O3
```

To further understand what is contained in the `sailr_*` folders, see the [usage](#usage) section above.
These commands assume you are in the root of the repo.
You can now use the `sailreval` package to aggregate the results like so to get the results from the paper:
```bash
./eval.py --summarize-targets ./eval.py --measure bash libselinux shadow libedit base-passwd openssh-portable \
dpkg dash grep diffutils findutils gnutls iproute2 gzip sysvinit bzip2 libacl libexpat libbsd tar rsyslog \
cronie zlib e2fsprogs coreutils \
--use-dec source ida ghidra angr_sailr angr_phoenix angr_dream angr_comb \
--use-metric gotos cfged bools func_calls \
--opt-levels O0 O1 O2 O3 \
--show-stats
```

## Reproducing SAILR paper results
We ran the entire pipeline of SAILR on an Ubuntu 22.04 machine that had 40 logical cores and 64 GB of RAM.
With these specs, it took about 8 hours to run the entire pipline for all 26 packages on the O2 optimization level.
If you intend to reproduce the results as they were in the paper, checkout this repo to commit [8442959e99c9d386c2cdfaf11346bf0f56e959eb](https://github.com/mahaloz/sailr-eval/commit/8442959e99c9d386c2cdfaf11346bf0f56e959eb),
which was the last version with minor fixes to the pipeline, but not edits to CFGED.
If you plan on evaluating modernly, use the latest commit, since it will have stability, speed, and other fixes to components of SAILR.

Due to slowness in processing of source with Joern, we recommend running the Joern stage **LOCALLY** and not in the
container. These commands assume you are in the root of the repo. Here is an example run of only coreutils:
```bash
./docker-eval.sh --compile coreutils --cores 20 && \
./eval.py --decompile coreutils --use-dec source --cores 20 && \
./docker-eval.sh --decompile coreutils --use-dec ghidra angr_sailr angr_phoenix angr_dream angr_comb --cores 20 && \
./eval.py --measure coreutils --use-metric gotos cfged bools func_calls --use-dec source ghidra angr_sailr angr_phoenix angr_dream angr_comb --cores 20 && \
./eval.py --summarize-targets coreutils --use-dec source ghidra angr_sailr angr_phoenix angr_dream angr_comb --use-metric gotos cfged bools func_calls --show-stats
```
Take note of when `eval.py` is run instead of the docker version.

To reproduce the results from the paper, you run the following evaluation scripts that will run the entire pipeline for you:
```bash
./paper_evaluations/all_packages_o2_table3.sh
./paper_evaluations/coreutils_o2_table4.sh
```

Run them one at a time to observe their output.

Note, you will likely **not** get the exact numbers shown in the paper, but the final conclusions on the numbers (i.e. the relative distance of each score) should be the same.
This is due to a fundamental limitation in CFGED, which relies on GED to compute the edit distance between two CFGs.
Since we never know if GED will conclude, we must use a timeout, which can be affected by the machine you run on.
However, for most cases the timeout should not be triggered.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ docker run \
-it \
--rm \
-v $PWD:/host \
angr-sailr-dec \
mahaloz/angr-sailr-dec \
$BINARY_PATH \
"${@:2}"
2 changes: 1 addition & 1 deletion sailreval/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "1.4.2"
__version__ = "1.6.0"

# create loggers
import logging
Expand Down
2 changes: 1 addition & 1 deletion sailreval/analysis/counting.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
from sailreval.utils import load_tomls_by_bin_name, bcolors
from sailreval.utils.sailr_target import SAILRTarget
from sailreval.utils.compile import DEFAULT_OPTIMIZATION_LEVELS, OPTIMIZATION_LEVELS
from pyjoern import JoernServer, JoernClient

from tqdm import tqdm
import toml
Expand Down Expand Up @@ -793,6 +792,7 @@ def _format_number(num):
def _find_functions_with_switches(source_path: Path, port):
source_name = source_path.name.split(".c")[0]

from pyjoern import JoernServer, JoernClient
with JoernServer(port=port):
client = JoernClient(source_path, port=port)
functions = client.functions_with_switches()
Expand Down
Loading

0 comments on commit 56eb298

Please sign in to comment.