symbexcel is a symbolic deobfuscator for XL4 macros, currently developed by Nicola Ruaro and Fabio Pagani.
Among a number of other things, symbexcel:
- Supports malware analysts to reverse complex XL4 malware
- Automatically extracts Indicators of Compromise (IOCs) to improve detection of malicious Excel documents
This tool draws some concepts from angr, and is based on the excellent XLMMacroDeobfuscator by DissectMalware. Big kudos to him!
- Download symbexcel:
git clone https://github.com/ucsb-seclab/symbexcel && cd symbexcel
- Create a virtual environment (recommended but not required)
mkvirtualenv symbexcel
workon symbexcel
- Install symbexcel and its dependencies:
pip install -e .
- Start the analysis of a malicious XL4 sample:
python run.py --file /path/to/malicious/excel.xls --iocs
$ python run.py -h
usage: run.py [-h] -f FILE [-d] [--iocs] [--breakpoints BREAKPOINTS [BREAKPOINTS ...]] [--checkpoint CHECKPOINT] [--restore RESTORE] [-i] [--cfg] [-t TIMEOUT] [--com] [--nocache]
Required arguments:
-f FILE, --file FILE Path of the malicious sample
Optional arguments:
-d, --debug Enable debug output
--iocs Print Indicators of Compromise (IOCs)
--breakpoints BREAKPOINTS [BREAKPOINTS ...]
Set a breakpoint at a specific instruction count
--checkpoint CHECKPOINT
Create a checkpoint at a specific instruction count
--restore RESTORE Restore a checkpoint
-i, --interactive Drop an IPython shell after the execution
--cfg Save the CFG to /tmp/<sample name>.dot
-t TIMEOUT, --timeout TIMEOUT
Timeout value
COM specific arguments:
--com Use COM server to process a sample
--nocache Force the COM server to process the sample
symbexcel can either use xlrd2 or Office VBA to parse and extract the content of Excel 4 macrosheets. The VBA API are exposed through a COM server, and interactions from Python code are possible using the pywin32 package.
You can find all the information on how to setup the symbexcel COM server in the symbexcel-server repository. Once the server is up and running:
-
Add the server IP address in the HOST variable of symbexcel/excel_wrapper/com_config.env.
-
Add the option
--com
to the command line of symbexcel.
You can use also use this project as a Python library (import symbexcel
) in your own projects.
You can find some good examples for this in the tests
folder.
Using this project as a library will allow your code to single-step (or n-step) the simulation manager, use the find
argument in SimulationManager.run()
to specify a search function, etc.
from symbexcel import SimulationManager
from symbexcel.excel_wrapper import parse_excel_doc
excel_doc = parse_excel_doc('tests/bins/test_symbolic.xls')
simgr = SimulationManager(excel_doc)
simgr.step(n=1)
simgr.run(find=lambda s: '=ALERT' in s.formula)
print(simgr.one_found.formula)
You can use the Dockerfile
and docker-compose.yml
from this repo to create a docker container and run the create_clusters
script on a set of malware samples.
The folder specified in the input
and output
environment variables will be mounted as /input
and /output
in the container. You can pass any arguments for the create_clusters
script in the args
environment variable.
input=/data/xl4_dataset/ output=/data/symbexcel/docker_clustering args="--input /input --output /output --jobs 96 --timeout 1200 --debug --logfile" docker-compose up &> /data/symbexcel/docker_clustering_log &
After installing symbexcel, you can run all tests with cd tests && pytest
.
Alternatively, you can manually execute any test, i.e. cd tests && python test_file_formats.py
.
Creating new tests should be straightforward by looking at the existing test routines.
There's a repository from Lastline at https://github.com/Lastline-Inc/xl4samples with some public malicious samples. Download and run them at your own risk!