semgrep
(syntactic grep) is an open-source tool for finding patterns in code. It's useful for preventing the use of known anti-patterns in a codebase or enforcing the correct use of secure-by-default frameworks (e.g. always use a project's sanitization method on user-provided data).
semgrep is fast and powerful; it's grep-esque patterns are lifted into AST matchers. Compared to regexes these patterns aren’t affected by whitespaces, comments, newlines, the order of keyword arguments, variable renaming, and other language nuances.
Currently, the supported languages are: C, Go, Java, JavaScript, and Python.
There are two types of rules in Semgrep:
- Simple rules - expressed with a single
pattern
. - Advanced rules - expressed with multiple patterns, like: X must be true AND Y must be too, or X but NOT Y, or X must occur inside a block of code that Y matches. These patterns are composed with the
patterns
keyword.
In salus.yaml, both simple and advanced rules can be specified with a path to a Semgrep YAML config file. In adddition, simple rules can be specified directly in salus.yaml.
In salus.yaml, you can specify a set of semgrep rules with a path to a Semgrep config file. You must specify
config
- a full Semgrep config file/directory. This config can either be nested inside the repo, or start with/
if the path also containssemgrep
.- Either
required: true
orforbidden: true
- If a found pattern is forbidden or if a not found pattern is required, then the scanner will fail and the
message
will be show to the developer in the report.
- If a found pattern is forbidden or if a not found pattern is required, then the scanner will fail and the
In addition, you can optionally specify
exclude
- Skip any file or directory that matches this pattern--exclude='*.py'
will ignore the following: foo.py, src/foo.py, foo.py/bar.sh. --exclude='tests' will ignore tests/foo.py as well as a/b/tests/c/foo.py. Can add multiple times.
Here is an example semgrep section of a salus.yaml.
scanner_configs:
Semgrep:
matches:
- config: semgrep_config_1.yaml
forbidden: true
- config: semgrep_config_2.yaml
forbidden: true
exclude:
- tests
Example semgrep_config_1.yaml. The rule says find all patterns of the form
$X == $X
, but exclude 0 == 0
.
rules:
- id: eqeq-always-true
patterns:
- pattern: $X == $X
- pattern-not: 0 == 0
message: "$X == $X is always true"
languages: [python]
severity: ERROR
Keywords in this file:
id
- Unique, descriptive identifier, cannot contain whitespaces (required)patterns
orpattern
- patterns or pattern or pattern-regex (required)message
- Message if rule (forbidden and found) or (required and not found) (optional)languages
- Any of: c, go, java, javascript, or python (required)severity
- One of: WARNING, ERROR (required)
Simple rules that can be expressed with a single pattern
can be directly specified in salus.yaml.
Each simple rule in salus.yaml must include
pattern
- the single patternforbidden: true
or `required: truelanguage
- Any of: c, go, java, javascript, or pythonsub-dir
- this pattern will apply only to the sub-dir listed. This should be a valid sub-directory under the directory defined by "directories" under "recursion" config
The user can optionally provide
exclude
- Skip any file or directory that matches this pattern--exclude='*.py'
will ignore the following: foo.py, src/foo.py, foo.py/bar.sh. --exclude='tests' will ignore tests/foo.py as well as a/b/tests/c/foo.py. Can add multiple times.
message
- Message if rule (forbidden and found) or (required and not found)
Example,
scanner_configs:
Semgrep:
matches:
- pattern: $X == $X
message: Useless equlity check
language: python
forbidden: true
exclude:
- tests
- pattern: $X.unsanitize(...)
message: Don't call `unsanitize()` methods without careful review
language: js
forbidden: true
exclude:
- node_modules
- pattern: $LOG_ENDPOINT = os.getenv("LOGGER_ENDPOINT", ...)
message: All files need to get the dynamic logger. Please don't hardcode this.
language: python
required: true
Please see semgrep's documentation on how to use an inline comment to allowlist findings.
You can also whitelist all findings for specific ids in the salus config, like
scanner_configs:
Semgrep:
exceptions:
- advisory_id: myid1
changed_by: engineer1
notes: false positive because ...
- advisory_id: myid2
changed_by: engineer2
notes: false positive because ...
- There may be parser-related issues from Semgrep
- Parser-related issues will be displayed as warnings and do not cause salus to fail.
- Salus will still show semgrep results from files that do not have parser issues.
- Salus semgrep currently does not support scanning against pre-built rules.
- But we plan to support this in the near future!