Scripts for the purposes of scraping Sourcegraph search results. Script scripts/json-to-raw-url.sh
extracts raw GitHub file URLs from src-cli, and scripts/github_downloader.py
downloads all the files from GitHub.
$ src search -stream -json '${{github.event.comment.body}} file:.github/workflows COUNT:100000' | ./scripts/json-to-raw-url.sh | python3 scripts/github_downloader.py
This allows security researchers to run static analysis tools on a mass of GitHub repos which are fetched from Sourcegraph. Here's an example of running Semgrep:
$ semgrep --config "p/github-actions" out
The output will include full repository file paths, allowing us to easily identify the vulnerable repositories.
$ git clone https://github.com/KarimPwnz/sourcegraph-scripts.git
$ cd sourcegraph-scripts
$ pip install -r requirements.txt