Nosey Parker v0.14.0
Docker Images
A prebuilt multiplatform Docker image for this release is available for x86_64 and ARM64 architectures:
docker pull ghcr.io/praetorian-inc/noseyparker:v0.14.0
Additions
-
Running
noseyparker --version
now emits many compile-time details about the build, which can be useful for troubleshooting (#48). -
The
github
andscan
commands now support accessing GitHub Enterprise Server instances using the new--github-api-url URL
parameter (#53—thank you @AdnaneKhan!). -
New rules have been added:
- Amazon Resource Name
- AWS S3 Bucket (subdomain style)
- AWS S3 Bucket (path style)
- Google Cloud Storage Bucket (subdomain style)
- Google Cloud Storage Bucket (path style)
- HuggingFace User Access Token (#54—thank you @AdnaneKhan!)
-
Rules are now required to have a globally-unique identifier (#62)
-
Two new advanced global command-line parameters have been exposed:
--rlimit-nofile LIMIT
to control the maximum number of open file descriptors--enable-backtraces BOOL
to control whether backtraces are printed upon panic
-
The snippet length for matches found by the
scan
command can now be controlled with the new--snippet-length BYTES
parameter. -
The Git repository cloning behavior in the
scan
command can now be controlled with the new--git-clone-mode {mirror,bare}
parameter. -
The
scan
command now collects additional metadata about blobs. This metadata includes size in bytes and guessed mime type based on filename extension. Optionally, if the non-defaultlibmagic
Cargo feature is enabled, the mime type and charset are guessed by passing the content of the blob throughlibmagic
(the guts of thefile
command-line program).By default, all this additional metadata is recorded into the datastore for each blob in which matches are found. This can be more precisely controlled using the new
--blob-metadata={all,matching,none}
parameter.This newly-collected metadata is included in output of the
report
command. -
The
scan
command now collects additional metadata about blobs found within Git repositories. Specifically, for each blob found in Git repository history, the set of commits where it was introduced and the accompanying pathname for the blob is collected (#16). This is enabled by default, but can be controlled using the new--git-blob-provenance={first-seen,minimal}
parameter.This newly-collected metadata is included in output of the
report
command.
Changes
-
The datastore schema has been changed in an incompatible way such that migrating existing datastores to the new version is not possible. This was necessary to support the significantly increased metadata that is now collected when scanning. Datastores from earlier releases of Nosey Parker cannot be used with this release; instead, the inputs will have to be rescanned with a new datastore.
-
The JSON and JSONL output formats for the
report
command have changed slightly. In particular, the.matches[].provenance
field is now an array of objects instead of a single object, making it possible to handle situations where a blob is discovered multiple ways. Theprovenenance
objects have some renamed fields, and contain significantly more metadata than before. -
Existing rules were modified to reduce both false positives and false negatives:
- Generic Password (double quoted)
- Generic Password (single quoted)
-
The default size of match snippets has been increased from 128 bytes before and after to 256.
This typically gives 4-7 lines of context before and after each match. -
When a Git repository is cloned, the default behavior is to match
git clone --bare
instead ofgit clone --mirror
. This new default behavior results in cloning potentially less content, but avoids cloning content from forks from repositories hosted on GitHub. -
The command-line help has been refined for clarity.
-
Scanning performance has been improved on particular workloads by as much as 2x by recording matches to the datastore in larger batches. This is particularly relevant to heavy multithreaded scanning workloads where the inputs have many matches.
Fixes
-
Python is no longer required as a build-time dependency for
vectorscan-sys
. -
A typo was fixed in the Okta API Key rule that caused it to truncate the secret.
-
The
scan
command now correctly reports the number of newly-seen matches when reusing an existing datastore.