From ac2b239a909ef1b1070af3d8d984775b9f9857f6 Mon Sep 17 00:00:00 2001 From: Jennifer Cwagenberg Date: Wed, 17 Apr 2024 16:53:00 -0500 Subject: [PATCH 1/5] docs(markdown): :memo: Run markdownlint to fix formatting --- README.md | 58 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 01a6069..c378869 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,10 @@ [![Supported Versions](https://img.shields.io/pypi/pyversions/modelscan.svg)](https://pypi.org/project/modelscan) [![pypi Version](https://img.shields.io/pypi/v/modelscan)](https://pypi.org/project/modelscan) [![License: Apache 2.0](https://img.shields.io/crates/l/apa)](https://opensource.org/license/apache-2-0/) +[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit) + # ModelScan: Protection Against Model Serialization Attacks + Machine Learning (ML) models are shared publicly over the internet, within teams and across teams. The rise of Foundation Models have resulted in public ML models being increasingly consumed for further training/fine tuning. ML Models are increasingly used to make critical decisions and power mission-critical applications. Despite this, models are not scanned with the rigor of a PDF file in your inbox. @@ -15,9 +18,9 @@ This needs to change, and proper tooling is the first step. ![ModelScan Preview](/imgs/modelscan-unsafe-model.gif) -ModelScan is an open source project that scans models to determine if they contain -unsafe code. It is the first model scanning tool to support multiple model formats. -ModelScan currently supports: H5, Pickle, and SavedModel formats. This protects you +ModelScan is an open source project that scans models to determine if they contain +unsafe code. It is the first model scanning tool to support multiple model formats. +ModelScan currently supports: H5, Pickle, and SavedModel formats. This protects you when using PyTorch, TensorFlow, Keras, Sklearn, XGBoost, with more on the way. ## TL;DR @@ -38,9 +41,9 @@ modelscan -p /path/to/model_file.pkl Models are often created from automated pipelines, others may come from a data scientist’s laptop. In either case the model needs to move from one machine to another before it is used. That process of saving a model to disk is called serialization. -A **Model Serialization Attack** is where malicious code is added to the contents of a model during serialization(saving) before distribution — a modern version of the Trojan Horse. +A **Model Serialization Attack** is where malicious code is added to the contents of a model during serialization(saving) before distribution — a modern version of the Trojan Horse. -The attack functions by exploiting the saving and loading process of models. When you load a model with `model = torch.load(PATH)`, PyTorch opens the contents of the file and begins to running the code within. The second you load the model the exploit has executed. +The attack functions by exploiting the saving and loading process of models. When you load a model with `model = torch.load(PATH)`, PyTorch opens the contents of the file and begins to running the code within. The second you load the model the exploit has executed. A **Model Serialization Attack** can be used to execute: @@ -55,19 +58,19 @@ These attacks are incredibly simple to execute and you can view working examples ### How ModelScan Works -If loading a model with your machine learning framework automatically executes the attack, +If loading a model with your machine learning framework automatically executes the attack, how does ModelScan check the content without loading the malicious code? -Simple, it reads the content of the file one byte at a time just like a string, looking for +Simple, it reads the content of the file one byte at a time just like a string, looking for code signatures that are unsafe. This makes it incredibly fast, scanning models in the time it takes for your computer to process the total filesize from disk(seconds in most cases). It also secure. ModelScan ranks the unsafe code as: -* CRITICAL -* HIGH -* MEDIUM -* LOW +- CRITICAL +- HIGH +- MEDIUM +- LOW ![ModelScan Flow Chart](/imgs/model_scan_flow_chart.png) @@ -78,7 +81,7 @@ it opens you up for attack. Use your discretion to determine if that is appropri ### What Models and Frameworks Are Supported? -This will be expanding continually, so look out for changes in our release notes. +This will be expanding continually, so look out for changes in our release notes. At present, ModelScan supports any Pickle derived format and many others: @@ -90,7 +93,8 @@ At present, ModelScan supports any Pickle derived format and many others: | | [keras.models.save(save_format= 'keras')](https://www.tensorflow.org/guide/keras/serialization_and_saving) | Keras V3 (Hierarchical Data Format) | Yes | | Classic ML Libraries (Sklearn, XGBoost etc.) | pickle.dump(), dill.dump(), joblib.dump(), cloudpickle.dump() | Pickle, Cloudpickle, Dill, Joblib | Yes | -### Installation +### Installation + ModelScan is installed on your systems as a Python package(Python 3.8 to 3.11 supported). As shown from above you can install it by running this in your terminal: @@ -106,6 +110,7 @@ modelscan = ">=0.1.1" ``` Scanners for Tensorflow or HD5 formatted models require installation with extras: + ```bash pip install 'modelscan[ tensorflow, h5py ]' ``` @@ -114,10 +119,10 @@ pip install 'modelscan[ tensorflow, h5py ]' ModelScan supports the following arguments via the CLI: -| Usage | Argument | Explanation | +| Usage | Argument | Explanation | |----------------------------------------------------------------------------------|------------------|---------------------------------------------------------| -| ```modelscan -h ``` | -h or --help | View usage help | -| ```modelscan -v ``` | -v or --version | View version information | +| ```modelscan -h``` | -h or --help | View usage help | +| ```modelscan -v``` | -v or --version | View version information | | ```modelscan -p /path/to/model_file``` | -p or --path | Scan a locally stored model | | ```modelscan -p /path/to/model_file --settings-file ./modelscan-settings.toml``` | --settings-file | Scan a locally stored model using custom configurations | | ```modelscan create-settings-file``` | -l or --location | Create a configurable settings file | @@ -125,11 +130,12 @@ ModelScan supports the following arguments via the CLI: | ```modelscan -r reporting-format -o file-name``` | -o or --output-file | Optional file name for output report | | ```modelscan --show-skipped``` | --show-skipped | Print a list of files that were skipped during the scan | - Remember models are just like any other form of digital media, you should scan content from any untrusted source before use. -##### CLI Exit Codes +#### CLI Exit Codes + The CLI exit status codes are: + - `0`: Scan completed successfully, no vulnerabilities found - `1`: Scan completed successfully, vulnerabilities found - `2`: Scan failed, modelscan threw an error while scanning @@ -143,9 +149,9 @@ Once a scan has been completed you'll see output like this if an issue is found: ![ModelScan Scan Output](https://github.com/protectai/modelscan/raw/main/imgs/cli_output.png) Here we have a model that has an unsafe operator for both `ReadFile` and `WriteFile` in the model. -Clearly we do not want our models reading and writing files arbitrarily. We would now reach out +Clearly we do not want our models reading and writing files arbitrarily. We would now reach out to the creator of this model to determine what they expected this to do. In this particular case -it allows an attacker to read our AWS credentials and write them to another place. +it allows an attacker to read our AWS credentials and write them to another place. That is a firm NO for usage. @@ -182,13 +188,13 @@ to learn more! ## Licensing -Copyright 2023 Protect AI +Copyright 2023 Protect AI Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, @@ -201,9 +207,7 @@ limitations under the License. We were heavily inspired by [Matthieu Maitre](http://mmaitre314.github.io) who built [PickleScan](https://github.com/mmaitre314/picklescan). We appreciate the work and have extended it significantly with ModelScan. ModelScan is OSS’ed in the similar spirit as PickleScan. -## Contributing - -We would love to have you contribute to our open source ModelScan project. -If you would like to contribute, please follow the details on [Contribution page](https://github.com/protectai/modelscan/blob/main/CONTRIBUTING.md). +## Contributing - +We would love to have you contribute to our open source ModelScan project. +If you would like to contribute, please follow the details on [Contribution page](https://github.com/protectai/modelscan/blob/main/CONTRIBUTING.md). From e3bd2676ae298456519564a9c232712e02791603 Mon Sep 17 00:00:00 2001 From: Jennifer Cwagenberg Date: Wed, 17 Apr 2024 16:56:55 -0500 Subject: [PATCH 2/5] build(makefile): :art: add .Phony targets + help support --- Makefile | 53 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/Makefile b/Makefile index f7d91fd..290105e 100644 --- a/Makefile +++ b/Makefile @@ -1,42 +1,69 @@ +.DEFAULT_GOAL := help VERSION ?= $(shell dunamai from git --style pep440 --format "{base}.dev{distance}+{commit}") -install-dev: +.PHONY: install-dev +install-dev: ## Install all dependencies including dev and test dependencies, as well as pre-commit. poetry install --with dev --with test --extras "tensorflow h5py" pre-commit install -install: +.PHONY: install +install: ## Install required dependencies. poetry install -install-prod: +.PHONY: install-prod +install-prod: ## Install prod dependencies. poetry install --with prod -install-test: +.PHONY: install-test +install-test: ## Install test dependencies. poetry install --with test --extras "tensorflow h5py" -clean: - pip uninstall modelscan +.PHONY: clean +clean: ## Uninstall modelscan + python -m pip uninstall modelscan -test: - poetry run pytest +.PHONY: test +test: ## Run pytests. + poetry run pytest --cov=modelscan tests/ -build: +.PHONY: build +build: ## Build the source and wheel achive. poetry build +.PHONY: build-prod build-prod: version +build-prod: ## Update the version and build wheel archive. poetry build -version: +.PHONY: version +version: ## Bumps the version of the project. echo "__version__ = '$(VERSION)'" > modelscan/_version.py poetry version $(VERSION) +.PHONY: lint lint: bandit mypy +lint: ## Run all the linters. -bandit: +.PHONY: bandit +bandit: ## Run SAST scanning. poetry run bandit -c pyproject.toml -r . -mypy: +.PHONY: mypy +mypy: ## Run type checking. poetry run mypy --ignore-missing-imports --strict --check-untyped-defs . -format: +.PHONY: black +format: ## Run black to format the code. black . + +.PHONY: help +help: ## List all targets and help information. + @grep --no-filename -E '^([a-z.A-Z_%-/]+:.*?)##' $(MAKEFILE_LIST) | sort | \ + awk 'BEGIN {FS = ":.*?(## ?)"}; { \ + if (length($$1) > 0) { \ + printf " \033[36m%-30s\033[0m %s\n", $$1, $$2; \ + } else { \ + printf "%s\n", $$2; \ + } \ + }' From 5d137c9fc35bf9f13adbf03f0937e24137021f19 Mon Sep 17 00:00:00 2001 From: Jennifer Cwagenberg Date: Wed, 17 Apr 2024 17:11:49 -0500 Subject: [PATCH 3/5] test(codecov): :green_heart: add support for reporting code coverage when runing pytest --- poetry.lock | 87 +++++++++++++++++++++++++++++++++++++++++++++++++- pyproject.toml | 1 + 2 files changed, 87 insertions(+), 1 deletion(-) diff --git a/poetry.lock b/poetry.lock index 0b570ee..2ebc46c 100644 --- a/poetry.lock +++ b/poetry.lock @@ -394,6 +394,73 @@ files = [ {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"}, ] +[[package]] +name = "coverage" +version = "7.4.4" +description = "Code coverage measurement for Python" +optional = false +python-versions = ">=3.8" +files = [ + {file = "coverage-7.4.4-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:e0be5efd5127542ef31f165de269f77560d6cdef525fffa446de6f7e9186cfb2"}, + {file = "coverage-7.4.4-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:ccd341521be3d1b3daeb41960ae94a5e87abe2f46f17224ba5d6f2b8398016cf"}, + {file = "coverage-7.4.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09fa497a8ab37784fbb20ab699c246053ac294d13fc7eb40ec007a5043ec91f8"}, + {file = "coverage-7.4.4-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b1a93009cb80730c9bca5d6d4665494b725b6e8e157c1cb7f2db5b4b122ea562"}, + {file = "coverage-7.4.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:690db6517f09336559dc0b5f55342df62370a48f5469fabf502db2c6d1cffcd2"}, + {file = "coverage-7.4.4-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:09c3255458533cb76ef55da8cc49ffab9e33f083739c8bd4f58e79fecfe288f7"}, + {file = "coverage-7.4.4-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:8ce1415194b4a6bd0cdcc3a1dfbf58b63f910dcb7330fe15bdff542c56949f87"}, + {file = "coverage-7.4.4-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:b91cbc4b195444e7e258ba27ac33769c41b94967919f10037e6355e998af255c"}, + {file = "coverage-7.4.4-cp310-cp310-win32.whl", hash = "sha256:598825b51b81c808cb6f078dcb972f96af96b078faa47af7dfcdf282835baa8d"}, + {file = "coverage-7.4.4-cp310-cp310-win_amd64.whl", hash = "sha256:09ef9199ed6653989ebbcaacc9b62b514bb63ea2f90256e71fea3ed74bd8ff6f"}, + {file = "coverage-7.4.4-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:0f9f50e7ef2a71e2fae92774c99170eb8304e3fdf9c8c3c7ae9bab3e7229c5cf"}, + {file = "coverage-7.4.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:623512f8ba53c422fcfb2ce68362c97945095b864cda94a92edbaf5994201083"}, + {file = "coverage-7.4.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0513b9508b93da4e1716744ef6ebc507aff016ba115ffe8ecff744d1322a7b63"}, + {file = "coverage-7.4.4-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:40209e141059b9370a2657c9b15607815359ab3ef9918f0196b6fccce8d3230f"}, + {file = "coverage-7.4.4-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8a2b2b78c78293782fd3767d53e6474582f62443d0504b1554370bde86cc8227"}, + {file = "coverage-7.4.4-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:73bfb9c09951125d06ee473bed216e2c3742f530fc5acc1383883125de76d9cd"}, + {file = "coverage-7.4.4-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:1f384c3cc76aeedce208643697fb3e8437604b512255de6d18dae3f27655a384"}, + {file = "coverage-7.4.4-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:54eb8d1bf7cacfbf2a3186019bcf01d11c666bd495ed18717162f7eb1e9dd00b"}, + {file = "coverage-7.4.4-cp311-cp311-win32.whl", hash = "sha256:cac99918c7bba15302a2d81f0312c08054a3359eaa1929c7e4b26ebe41e9b286"}, + {file = "coverage-7.4.4-cp311-cp311-win_amd64.whl", hash = "sha256:b14706df8b2de49869ae03a5ccbc211f4041750cd4a66f698df89d44f4bd30ec"}, + {file = "coverage-7.4.4-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:201bef2eea65e0e9c56343115ba3814e896afe6d36ffd37bab783261db430f76"}, + {file = "coverage-7.4.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:41c9c5f3de16b903b610d09650e5e27adbfa7f500302718c9ffd1c12cf9d6818"}, + {file = "coverage-7.4.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d898fe162d26929b5960e4e138651f7427048e72c853607f2b200909794ed978"}, + {file = "coverage-7.4.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3ea79bb50e805cd6ac058dfa3b5c8f6c040cb87fe83de10845857f5535d1db70"}, + {file = "coverage-7.4.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ce4b94265ca988c3f8e479e741693d143026632672e3ff924f25fab50518dd51"}, + {file = "coverage-7.4.4-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:00838a35b882694afda09f85e469c96367daa3f3f2b097d846a7216993d37f4c"}, + {file = "coverage-7.4.4-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:fdfafb32984684eb03c2d83e1e51f64f0906b11e64482df3c5db936ce3839d48"}, + {file = "coverage-7.4.4-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:69eb372f7e2ece89f14751fbcbe470295d73ed41ecd37ca36ed2eb47512a6ab9"}, + {file = "coverage-7.4.4-cp312-cp312-win32.whl", hash = "sha256:137eb07173141545e07403cca94ab625cc1cc6bc4c1e97b6e3846270e7e1fea0"}, + {file = "coverage-7.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:d71eec7d83298f1af3326ce0ff1d0ea83c7cb98f72b577097f9083b20bdaf05e"}, + {file = "coverage-7.4.4-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:d5ae728ff3b5401cc320d792866987e7e7e880e6ebd24433b70a33b643bb0384"}, + {file = "coverage-7.4.4-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:cc4f1358cb0c78edef3ed237ef2c86056206bb8d9140e73b6b89fbcfcbdd40e1"}, + {file = "coverage-7.4.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8130a2aa2acb8788e0b56938786c33c7c98562697bf9f4c7d6e8e5e3a0501e4a"}, + {file = "coverage-7.4.4-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cf271892d13e43bc2b51e6908ec9a6a5094a4df1d8af0bfc360088ee6c684409"}, + {file = "coverage-7.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a4cdc86d54b5da0df6d3d3a2f0b710949286094c3a6700c21e9015932b81447e"}, + {file = "coverage-7.4.4-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:ae71e7ddb7a413dd60052e90528f2f65270aad4b509563af6d03d53e979feafd"}, + {file = "coverage-7.4.4-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:38dd60d7bf242c4ed5b38e094baf6401faa114fc09e9e6632374388a404f98e7"}, + {file = "coverage-7.4.4-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:aa5b1c1bfc28384f1f53b69a023d789f72b2e0ab1b3787aae16992a7ca21056c"}, + {file = "coverage-7.4.4-cp38-cp38-win32.whl", hash = "sha256:dfa8fe35a0bb90382837b238fff375de15f0dcdb9ae68ff85f7a63649c98527e"}, + {file = "coverage-7.4.4-cp38-cp38-win_amd64.whl", hash = "sha256:b2991665420a803495e0b90a79233c1433d6ed77ef282e8e152a324bbbc5e0c8"}, + {file = "coverage-7.4.4-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:3b799445b9f7ee8bf299cfaed6f5b226c0037b74886a4e11515e569b36fe310d"}, + {file = "coverage-7.4.4-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:b4d33f418f46362995f1e9d4f3a35a1b6322cb959c31d88ae56b0298e1c22357"}, + {file = "coverage-7.4.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aadacf9a2f407a4688d700e4ebab33a7e2e408f2ca04dbf4aef17585389eff3e"}, + {file = "coverage-7.4.4-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:7c95949560050d04d46b919301826525597f07b33beba6187d04fa64d47ac82e"}, + {file = "coverage-7.4.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ff7687ca3d7028d8a5f0ebae95a6e4827c5616b31a4ee1192bdfde697db110d4"}, + {file = "coverage-7.4.4-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:5fc1de20b2d4a061b3df27ab9b7c7111e9a710f10dc2b84d33a4ab25065994ec"}, + {file = "coverage-7.4.4-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:c74880fc64d4958159fbd537a091d2a585448a8f8508bf248d72112723974cbd"}, + {file = "coverage-7.4.4-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:742a76a12aa45b44d236815d282b03cfb1de3b4323f3e4ec933acfae08e54ade"}, + {file = "coverage-7.4.4-cp39-cp39-win32.whl", hash = "sha256:d89d7b2974cae412400e88f35d86af72208e1ede1a541954af5d944a8ba46c57"}, + {file = "coverage-7.4.4-cp39-cp39-win_amd64.whl", hash = "sha256:9ca28a302acb19b6af89e90f33ee3e1906961f94b54ea37de6737b7ca9d8827c"}, + {file = "coverage-7.4.4-pp38.pp39.pp310-none-any.whl", hash = "sha256:b2c5edc4ac10a7ef6605a966c58929ec6c1bd0917fb8c15cb3363f65aa40e677"}, + {file = "coverage-7.4.4.tar.gz", hash = "sha256:c901df83d097649e257e803be22592aedfd5182f07b3cc87d640bbb9afd50f49"}, +] + +[package.dependencies] +tomli = {version = "*", optional = true, markers = "python_full_version <= \"3.11.0a6\" and extra == \"toml\""} + +[package.extras] +toml = ["tomli"] + [[package]] name = "dill" version = "0.3.7" @@ -1556,6 +1623,24 @@ tomli = {version = ">=1.0.0", markers = "python_version < \"3.11\""} [package.extras] testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "nose", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"] +[[package]] +name = "pytest-cov" +version = "5.0.0" +description = "Pytest plugin for measuring coverage." +optional = false +python-versions = ">=3.8" +files = [ + {file = "pytest-cov-5.0.0.tar.gz", hash = "sha256:5837b58e9f6ebd335b0f8060eecce69b662415b16dc503883a02f45dfeb14857"}, + {file = "pytest_cov-5.0.0-py3-none-any.whl", hash = "sha256:4f0764a1219df53214206bf1feea4633c3b558a2925c8b59f144f682861ce652"}, +] + +[package.dependencies] +coverage = {version = ">=5.2.1", extras = ["toml"]} +pytest = ">=4.6" + +[package.extras] +testing = ["fields", "hunter", "process-tests", "pytest-xdist", "virtualenv"] + [[package]] name = "pyyaml" version = "6.0.1" @@ -2321,4 +2406,4 @@ tensorflow = ["tensorflow", "tensorflow-macos"] [metadata] lock-version = "2.0" python-versions = ">=3.8,<3.13" -content-hash = "d7b5a659c7b42f8da6ea7a509a9bcbc9c3bff552f92c9f32f4d4c844f0c28769" +content-hash = "d878d17c244a900a1e3e96947806d207ff4cf0371a6d730d5820fb98a20d2bc2" diff --git a/pyproject.toml b/pyproject.toml index fe88806..69209f8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -30,6 +30,7 @@ h5py = ["h5py"] [tool.poetry.group.test.dependencies] pytest = "^7.4.0" +pytest-cov= "^5.0.0" bandit = { version = "1.7.8", extras = ["toml"] } mypy = "^1.4.1" requests = "^2.31.0" From 488d7553a2ca4cdbc88d068965b7b245a8553427 Mon Sep 17 00:00:00 2001 From: Jennifer Cwagenberg Date: Thu, 18 Apr 2024 03:38:51 -0500 Subject: [PATCH 4/5] docs(lint): :rotating_light: markdownlint the things --- docs/model_serialization_attacks.md | 13 +++++++------ docs/severity_levels.md | 15 ++++++++------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/docs/model_serialization_attacks.md b/docs/model_serialization_attacks.md index a1eb7a4..4135eb2 100644 --- a/docs/model_serialization_attacks.md +++ b/docs/model_serialization_attacks.md @@ -5,7 +5,7 @@ Machine Learning(ML) models are the foundational asset in ML powered application Models can be compromised in various ways, some are new like adversarial machine learning methods, others are common with traditional applications like denial of service attacks. While these can be a threat to safely operating an ML powered application, this document focuses on exposing the risk of Model Serialization Attacks. In a Model Serialization Attack malicious code is added to a model when it is saved, this is also called a code injection attack as well. When any user or system then loads the model for further training or inference the attack code is executed immediately, often with no visible change in behavior to users. This makes the attack a powerful vector and an easy point of entry for attacking broader machine learning components. -To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization. +To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization. ML models are composed of: @@ -30,7 +30,7 @@ Before digging into how a Model Serialization Attack works and how to scan for t ## 1. Pickle Variants -**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is. +**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is. Pickle is the defacto library for serializing ML models for following ML frameworks: @@ -47,15 +47,15 @@ Pickle is also used to store vectors/tensors only for following frameworks: Pickle allows for arbitrary code execution and is highly vulnerable to code injection attacks with very large attack surface. Pickle documentation makes it clear with the following warning: > **Warning:** The `pickle` module **is not secure**. Only unpickle data you trust. -> -> +> +> > It is possible to construct malicious pickle data which will **execute > arbitrary code during unpickling**. Never unpickle data that could have come > from an untrusted source, or that could have been tampered with. -> +> > Consider signing data with [hmac](https://docs.python.org/3/library/hmac.html#module-hmac) if you need to ensure that it has not > been tampered with. -> +> > Safer serialization formats such as [json](https://docs.python.org/3/library/json.html#module-json) may be more appropriate if > you are processing untrusted data. @@ -129,6 +129,7 @@ With the exception of pickle, these formats cannot execute arbitrary code. Howev With an understanding of various approaches to model serialization, explore how many popular choices are vulnerable to this attack with an end to end explanation. # End to end Attack Scenario + 1. Internal attacker: The attack complexity will vary depending on the access trusted to an internal actor. 2. External attacker: diff --git a/docs/severity_levels.md b/docs/severity_levels.md index f270578..4ffdbad 100644 --- a/docs/severity_levels.md +++ b/docs/severity_levels.md @@ -1,15 +1,16 @@ # modelscan Severity Levels -modelscan classifies potentially malicious code injection attacks in the following four severity levels. +modelscan classifies potentially malicious code injection attacks in the following four severity levels.

+ - **CRITICAL:** A model file that consists of unsafe operators/globals that can execute code is classified at critical severity. These operators are: - - exec, eval, runpy, sys, open, breakpoint, os, subprocess, socket, nt, posix + - exec, eval, runpy, sys, open, breakpoint, os, subprocess, socket, nt, posix

- **HIGH:** A model file that consists of unsafe operators/globals that can not execute code but can still be exploited is classified at high severity. These operators are: - - webbrowser, httplib, request.api, Tensorflow ReadFile, Tensorflow WriteFile + - webbrowser, httplib, request.api, Tensorflow ReadFile, Tensorflow WriteFile

-- **MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity. - - Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks. - - Work in Progress: Custom operators will be classified at medium severity. +- **MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity. + - Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks. + - Work in Progress: Custom operators will be classified at medium severity.

-- **LOW:** At the moment no operators/globals are classified at low severity level. \ No newline at end of file +- **LOW:** At the moment no operators/globals are classified at low severity level. From 2c48b8f2b6878333e51b3ed11463bf639623ec9d Mon Sep 17 00:00:00 2001 From: Jennifer Cwagenberg Date: Thu, 18 Apr 2024 04:52:31 -0500 Subject: [PATCH 5/5] build(codecov): :technologist: save cov.xml so that it can be used with code coverage tools --- .gitignore | 5 ++++- Makefile | 10 +++++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index 1de7eb8..c45a7da 100644 --- a/.gitignore +++ b/.gitignore @@ -132,4 +132,7 @@ cython_debug/ # Notebook Model Downloads notebooks/PyTorchModels/ -pytorch-model-scan-results.json \ No newline at end of file +pytorch-model-scan-results.json + +# Code Coverage +cov.xml \ No newline at end of file diff --git a/Makefile b/Makefile index 290105e..7b068b9 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,10 @@ .DEFAULT_GOAL := help VERSION ?= $(shell dunamai from git --style pep440 --format "{base}.dev{distance}+{commit}") +.PHONY: env +env: ## Display information about the current environment. + poetry env info + .PHONY: install-dev install-dev: ## Install all dependencies including dev and test dependencies, as well as pre-commit. poetry install --with dev --with test --extras "tensorflow h5py" @@ -24,7 +28,11 @@ clean: ## Uninstall modelscan .PHONY: test test: ## Run pytests. - poetry run pytest --cov=modelscan tests/ + poetry run pytest tests/ + +.PHONY: test-cov +test-cov: ## Run pytests with code coverage. + poetry run pytest --cov=modelscan --cov-report xml:cov.xml tests/ .PHONY: build build: ## Build the source and wheel achive.