Skip to content

Commit

Permalink
Merge pull request #12 from the-markup/cli-bin
Browse files Browse the repository at this point in the history
Add blacklight-query bin
  • Loading branch information
dphiffer authored Sep 25, 2024
2 parents 074526e + 8c8b9fd commit 7202a11
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 9 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,11 @@ A command-line tool to fetch [Blacklight](https://themarkup.org/series/blackligh

- `nvm use`
- `npm install`
- Create `urls.txt` file, with newline-separated absolute URLs to scan
- `npm run main`
- `./blacklight-query urls.txt` where `urls.txt` has newline-separated absolute URLs to scan

## Inputs

Write all URLs you wish to scan as **absolute URLs** (including protocol, domain, and path) in a file named `urls.txt` in the root directory. Separate urls by newline.
Write all URLs you wish to scan as **absolute URLs** (including protocol, domain, and path). Separate each URL with a newline.

### Sample `urls.txt` file

Expand All @@ -25,6 +24,13 @@ https://www.themarkup.org
https://www.calmatters.org
```

### You can use pipes

You can also pipe your list of URLs.

- `echo "https://themarkup.org/" | ./blacklight-query`
- `./blacklight-query < urls.txt`

### Collector Options

All of the [`blacklight-collector`](https://github.com/the-markup/blacklight-collector?tab=readme-ov-file#collector-configuration) options can be specified using this tool, by editing the `config` object in `main.ts`.
Expand Down
28 changes: 28 additions & 0 deletions blacklight-query
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env bash

set -o errexit
set -o pipefail
set -o nounset

dir=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

if [ -t 0 ]; then
# URLs are passed as an argument value
if (( $# != 1 )) ; then
echo "Usage: blacklight-query urls.txt"
echo " echo \"https://themarkup.org\" | blacklight-query"
echo " blacklight-query < urls.txt"
echo
echo "Please provide a list of URLs, where each URL is on its own line."
exit 1
fi
"$dir/node_modules/.bin/ts-node" --project "$dir/tsconfig.json" "$dir/src/main.ts" $1
else
# URLs are piped to stdin
time=$(date +%s)
while read -r line ; do
echo $line >> "$dir/.urls-$time.txt"
done
"$dir/node_modules/.bin/ts-node" --project "$dir/tsconfig.json" "$dir/src/main.ts" "$dir/.urls-$time.txt"
rm "$dir/.urls-$time.txt"
fi
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"name": "@themarkup/blacklight-query",
"version": "1.0.0",
"description": "A simple tool to generate Blacklight-Collector scans of a list of urls",
"main": "build/index.js",
"main": "src/main.ts",
"bin": "./blacklight-query",
"funding": {
"type": "individual",
"url": "https://themarkup.org/donate"
Expand Down
15 changes: 10 additions & 5 deletions src/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@ import { collect } from "@themarkup/blacklight-collector";
import { reportFailures } from "./utils";

// Gather URLs from input file
const urlsPath = join(__dirname, '../urls.txt');
const urlsFile = process.argv[2];
let urlsPath;
if (urlsFile[0] == '/' || urlsFile[0] == '~') {
urlsPath = urlsFile;
} else {
urlsPath = join(process.cwd(), urlsFile);
}

if (!fs.existsSync(urlsPath)) {
console.log(
"Please create a file named 'urls.txt', containing a newline-separated list of urls to scan."
);
exit();
console.log(`Could not find ${urlsPath}.`);
exit(1);
}
const urls = fs.readFileSync(urlsPath, "utf8");
const urlsList = urls.trim().split(/\r?\n|\r|\n/g);
Expand Down

0 comments on commit 7202a11

Please sign in to comment.