Skip to content

Commit

Permalink
Merge pull request #8 from jdub233/release/0.0.2
Browse files Browse the repository at this point in the history
Release/0.0.2
  • Loading branch information
jdub233 authored Jun 18, 2020
2 parents 707c935 + 1e3ae0d commit ceb22be
Show file tree
Hide file tree
Showing 7 changed files with 381 additions and 219 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
node_modules
.serverless
config.yml
config*.yml
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,16 @@ For example, capturing the home page at `www.bu.edu` will also pull in and relin

The lambda is triggered by a secure API gateway interface using an API key.

The function includes error checking, and will cancel downloads on any assets that do not return a status code 200 (OK). If the root page capture fails, the entire capture is cancelled and no changes will be made to the current contents of the S3 bucket.

## How to configure

The capture URL, S3 bucket name, and S3 bucket path are configurable by setting values in a `config.yml` file.
The capture URL, S3 bucket name, and S3 bucket path are configurable by setting values in a `config.yml` file. Also, the captured assets can be stored in a separate subdirectory, if one is specified in the config.

- `CAPTURE_URL` sets the URL to the page to be captured
- `S3_BUCKET_NAME` and `S3_PATH` set the S3 destination for the captured static files
- `CAPTURE_URL` sets the URL to the page to be captured.
- `S3_BUCKET_NAME` sets the destination S3 bucket for the captured static files.
- `S3_PATH` sets a path within the bucket for the capture directory. If blank, the root of the bucket will be used.
- `SUBDIR_PREFIX` sets the name of the sub-directory used to store the assets (use the directory name only, no trailing slash). If blank, assets will be stored at the root.

When installing the Lambda, copy the `config.example.yml` to a `config.yml` file and customize the values. Once installed, they are also available as environment variables in the running Lambda and can be further adjusted from there.

Expand Down Expand Up @@ -98,4 +102,4 @@ The CloudFormation stack can also be removed using the serverless cli:

```bash
serverless remove
```
```
3 changes: 3 additions & 0 deletions handler.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ const s3 = require('s3-node-client');
const del = require('del');
const Url = require('url-parse');

const ValidatePlugin = require('./validatePlugin');

const captureURL = new Url(process.env.CAPTURE_URL);

// Allow for a prefix to the subdirectory, and add a slash if it is set.
Expand All @@ -20,6 +22,7 @@ const scrapeOptions = {
{directory: `${subDirPrefix}css`, extensions: ['.css']},
{directory: `${subDirPrefix}font`, extensions: ['.woff', '.woff2', '.ttf', '.eot']},
],
plugins: [ new ValidatePlugin() ],
};

const client = s3.createClient();
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "bu-page-capture-s3",
"version": "0.0.1",
"version": "0.0.2",
"description": "Capture a page an associated assets and push it to object storage",
"main": "index.js",
"author": "[email protected]",
Expand All @@ -10,7 +10,7 @@
"log": "sls logs --function capture"
},
"dependencies": {
"del": "^4.1.1",
"del": "^5.1.0",
"s3-node-client": "^4.4.4",
"url-parse": "^1.4.7",
"website-scraper": "^4.0.1"
Expand Down
2 changes: 1 addition & 1 deletion serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ frameworkVersion: ">=1.1.0 <2.0.0"

provider:
name: aws
runtime: nodejs8.10
runtime: nodejs12.x
stage: prod
region: us-east-1
iamRoleStatements:
Expand Down
20 changes: 20 additions & 0 deletions validatePlugin.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module.exports = class ValidatePlugin {
apply(registerAction) {
registerAction('error', async ({error}) => {console.error(error)});
registerAction('onResourceError', ({resource, error}) => console.log(`Resource ${resource.url} has error ${error}`));
registerAction('afterResponse', async ({response}) => {
if (response.statusCode !== 200) {
// Don't capture bad assets. Also cancels the upload phase if the root page has a bad status code.
console.log( `A bad status code ${response.statusCode} was encountered, cancelling asset capture` );
return null;
} else {
return {
body: response.body,
metadata: {
headers: response.headers,
}
};
}
});
}
}
Loading

0 comments on commit ceb22be

Please sign in to comment.