LastPyMile
: Identify the differences between build artifacts of PyPI packages and the respective source code repository
The paper has been published in the proceeding of ESEC/FSE 2021.
Figure below in an overview of the LastPyMile workflow and internal components:
LastPyMile
extends the current package scanning techniques for malware injections.
The tool analyzes a package from the PyPI repository by:
- Identifying the discrepancy (files and lines) between the source code and the package's artifact
- Scanning the discrepancy using Yara rules (MalwareCheck patterns) and AST code analysis (Bandit4mal patterns).
As such, LastPyMile
aims to detect malicious packages in the package owner hijacking, typosquatting/combosquatting attacks (See Ohm et al., Vu et al.). In these attacks, malicious code is injected into a package's artifact, which does not exist in the source code repository.
In comparison to the existing scanning tools employed by PyPI LastPyMile reduces the number of alerts produced by a malware checking tool to a number that a human can check. Also, it removes all the alerts from benign packages, and therefore, allows a clear distinction between benign and malicious packages.
LastPyMile
is originally developed by SAP Security Research
and Security Group at the University of Trento.
The tool is best described in the following scientific papers, please cite these if you use the tool for your research work:
- Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, Antonino Sabetta, Towards Using Source Code Repositories to Identify Software Supply Chain Attacks, ACM CCS 2020.
- Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, Antonino Sabetta, LastPyMile: identifying the discrepancy between sources and packages, ESEC/FSE 2021.
- Identify the Github URL of a PyPI package
- Identify the differences between build artifacts of software packages and the respective source code repository
- Scan the differences using Yara rules and bandit4mal
- Process a repository and artifact in parallel
Requires python 3.9
- At the root directory, run:
poetry install
to install package dependencies. This will also install pytest for testing the project. - At the root directory, run:
poetry shell
to active the environment
bandit4mal
is built using Python2 to scan both Python2 and Python3 code. So, please use python2 without any virutal environment when installing bandit4mal. We use bandit4mal
to scan the discrepancy and report the alerts associated with the discrepancy. bandit4mal
requires pbr>=2.0.0
- Go to tools, run
git clone https://github.com/lyvd/bandit4mal
- Install
bandit4mal
by running this commandsudo python2 setup.py install
- The bandit program will be installed at the path
/usr/local/bin/bandit
(MacOS and Ubuntu)
To list all available options:
python lastpymile.py -h
To scan a pacakge
python lastpymile.py <package_name>[:<package_version>]
- Binary distributions (e.g., .exe, .dmg) are not supported
- Packages that are not hosted on Github are not supported yet.
- Improve the techniques for finding Github URLs of a PyPI package. We are working to integrate py2src into LastPyMile.
- Update the API documentation in the docs directory
Contact me at [email protected] or Twitter @vuly16
Open a Pull request at the repository in the AssureMoss LastPyMile
This work is partly funded by the EU under the H2020 research project SPARTA (Grant No.830892), AssureMOSS (Grant No.952647) and CyberSec4Europe (Grant No.830929).