Skip to content

A reimplementation of LastPyMile: A Python-based library to Identify the differences between build artifacts of PyPI packages and the respective source code repository

License

Notifications You must be signed in to change notification settings

lyvd/lastpymile

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lastpymile

LastPyMile: Identify the differences between build artifacts of PyPI packages and the respective source code repository

The paper has been published in the proceeding of ESEC/FSE 2021.

Figure below in an overview of the LastPyMile workflow and internal components:

LastPyMile extends the current package scanning techniques for malware injections. The tool analyzes a package from the PyPI repository by:

  1. Identifying the discrepancy (files and lines) between the source code and the package's artifact
  2. Scanning the discrepancy using Yara rules (MalwareCheck patterns) and AST code analysis (Bandit4mal patterns).

As such, LastPyMile aims to detect malicious packages in the package owner hijacking, typosquatting/combosquatting attacks (See Ohm et al., Vu et al.). In these attacks, malicious code is injected into a package's artifact, which does not exist in the source code repository.

In comparison to the existing scanning tools employed by PyPI LastPyMile reduces the number of alerts produced by a malware checking tool to a number that a human can check. Also, it removes all the alerts from benign packages, and therefore, allows a clear distinction between benign and malicious packages.

History

LastPyMile is originally developed by SAP Security Research and Security Group at the University of Trento.

The tool is best described in the following scientific papers, please cite these if you use the tool for your research work:

Features

  • Identify the Github URL of a PyPI package
  • Identify the differences between build artifacts of software packages and the respective source code repository
  • Scan the differences using Yara rules and bandit4mal
  • Process a repository and artifact in parallel

Installation

Requires python 3.9

  • At the root directory, run: poetry install to install package dependencies. This will also install pytest for testing the project.
  • At the root directory, run: poetry shell to active the environment

Integrate bandit4mal into LastPyMile

bandit4mal is built using Python2 to scan both Python2 and Python3 code. So, please use python2 without any virutal environment when installing bandit4mal. We use bandit4mal to scan the discrepancy and report the alerts associated with the discrepancy. bandit4mal requires pbr>=2.0.0

  • Go to tools, run git clone https://github.com/lyvd/bandit4mal
  • Install bandit4mal by running this command sudo python2 setup.py install
  • The bandit program will be installed at the path /usr/local/bin/bandit (MacOS and Ubuntu)

Usage

To list all available options:

python lastpymile.py -h

To scan a pacakge

python lastpymile.py <package_name>[:<package_version>]

Limitations

  • Binary distributions (e.g., .exe, .dmg) are not supported
  • Packages that are not hosted on Github are not supported yet.

Known Issues

Todo (upcoming changes)

  • Improve the techniques for finding Github URLs of a PyPI package. We are working to integrate py2src into LastPyMile.
  • Update the API documentation in the docs directory

How to obtain support

Contact me at [email protected] or Twitter @vuly16

Contributing

Open a Pull request at the repository in the AssureMoss LastPyMile

Acknowledgement

This work is partly funded by the EU under the H2020 research project SPARTA (Grant No.830892), AssureMOSS (Grant No.952647) and CyberSec4Europe (Grant No.830929).

About

A reimplementation of LastPyMile: A Python-based library to Identify the differences between build artifacts of PyPI packages and the respective source code repository

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%