Skip to content

Commit

Permalink
update env vars
Browse files Browse the repository at this point in the history
update template and README

zip index.py and point template to zip

bad bucket name! bad

latest?

Update README.md

Change link to newest cf template file on s3

README, template, example json, linting (sorry)

remove examples. oops

new image in readme

env vars

readme

update cloudformation template link

update task.json and sample_dataset in examples

update docker image name

update task.json

remove stack specific info in task.json. Update readme
  • Loading branch information
whunter committed Mar 5, 2024
1 parent 20eab1a commit 9b4b489
Show file tree
Hide file tree
Showing 7 changed files with 190 additions and 176 deletions.
102 changes: 60 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,75 @@
# aws-batch-iiif-generator

## Publication
* [Code4Lib Journal - Scaling IIIF Image Tiling in the Cloud](https://journal.code4lib.org/articles/14933)

- [Code4Lib Journal - Scaling IIIF Image Tiling in the Cloud](https://journal.code4lib.org/articles/14933)

## Workflow

![Overview](images/overview.png "Overview")

1. Upload task file to the batch bucket
2. Batch bucket trigger a lambda function
3. Lambda function read the content in the task file and submit a batch job
4. Each batch job generates tiles and manifests from the original image and upload to the target S3 bucket
2. Batch bucket trigger launches an instance of a Lambda function
3. The Lambda function reads the content in the task file and submits a batch job
4. Each batch job generates tiles and manifests from the original image and uploads the generated derivatives to the target S3 bucket

![Batch Job](images/batch_job.png "Batch Job")

1. Pull raw original files from the S3 bucket
2. Generate tiles and manifests
3. Upload to target S3 bucket

### Deploy aws-batch-iiif-generator using CloudFormation stack

#### Step 1: Launch CloudFormation stack
[![Launch Stack](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?&templateURL=https://vtdlp-dev-cf.s3.amazonaws.com/awsiiifs3batch.template)

Click *Next* to continue
[![Launch Stack](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?&templateURL=https://vtlib-cf-template.s3.amazonaws.com/prod/cf-templates/aws-batch-iiif-generator/20240227/awsiiifs3batch.template)

Click _Next_ to continue

#### Step 2: Specify stack details

Note: It's a good idea to provide a namespace for these resource names to prevent collisions (prepend resource names w/ stack name maybe? Don't prepend anything to the `DockerImage`)
| Name | Description |
|----------|-------------|
| ------------------- | ---------------------------------------------------------- |
| Stack name | any valid name |
| BatchRepositoryName | any valid name for Batch process repository |
| DockerImage | any valid Docker image. E.g. yinlinchen/vtl:iiifs3_v3 |
| DockerImage | any valid Docker image. E.g. wlhunter/iiif_s3_tiling:latest |
| JDName | any valid name for Job definition |
| JQName | any valid name for Job queue |
| LambdaFunctionName | any valid name for Lambda function |
| LambdaRoleName | any valid name for Lambda role |
| LambdaRoleName | any valid name for Lambda role |
| S3BucketName | any valid name for S3 bucket |

#### Step 3: Configure stack options

Leave it as is and click **Next**

#### Step 4: Review

Make sure all checkboxes under Capabilities section are **CHECKED**

Click *Create stack*
Click _Create stack_

### Deploy aws-batch-iiif-generator using AWS CLI

Run the following in your shell to deploy the application to AWS:

```bash
aws cloudformation create-stack --stack-name awsiiifs3batch --template-body file://awsiiifs3batch.template --capabilities CAPABILITY_NAMED_IAM
```

See [Cloudformation: create stack](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/create-stack.html) for `--parameters` option

### Usage
* Prepare [task.json](examples/task.json)
* Prepare [dataset](examples/sample_dataset.zip) and upload to S3 `SRC_BUCKET` bucket
* Upload [task.json](examples/task.json) to the S3 bucket created after the deployment.
* Go to `AWS_BUCKET_NAME` to see the end results for generated IIIF tiles and manifests.
* Test manifests in [Mirador](https://projectmirador.org/demo/) (Note: you need to configure S3 access permission and CORS settings)
* See our [Live Demo](https://d2fmsr62h737j1.cloudfront.net/index.html)

- Prepare [task.json](examples/task.json)
- Prepare [dataset](examples/sample_dataset.zip) and upload to S3 `AWS_SRC_BUCKET` bucket
- Edit the `jobQueue` and `jobDefinition` values in [task.json](examples/task.json) to match the resource names specified during stack creation.
- Upload [task.json](examples/task.json) to the S3 bucket created after the deployment.
- Go to `AWS_DEST_BUCKET` to see the end results for generated IIIF tiles and manifests.
- Test manifests in [Mirador](https://projectmirador.org/demo/) (Note: you need to configure S3 access permission and CORS settings)
- See our [Live Demo](https://d2fmsr62h737j1.cloudfront.net/index.html)

### Cleanup

Expand All @@ -67,38 +80,43 @@ aws cloudformation delete-stack --stack-name stackname
```

## Batch Configuration
* Compute Environment: Type: `EC2`, MinvCpus: `0`, MaxvCpus: `128`, InstanceTypes: `optimal`
* Job Definition: Type: `container`, Image: `DockerImage`, Vcpus: `2`, Memory: `2000`
* Job Queue: Priority: `10`

- Compute Environment: Type: `EC2`, MinvCpus: `0`, MaxvCpus: `128`, InstanceTypes: `optimal`
- Job Definition: Type: `container`, Image: `DockerImage`, Vcpus: `2`, Memory: `2000`
- Job Queue: Priority: `10`

## S3
* SRC_BUCKET: For raw images and CSV files to be processed
* Raw image files
* CSV files
* AWS_BUCKET_NAME: For saving tiles and manifests files

- AWS_SRC_BUCKET: For raw images and CSV files to be processed
- Raw image files
- CSV files
- AWS_DEST_BUCKET: For saving tiles and manifests files

## Lambda function
* [index.py](src/index.py): Submit a batch job when a task file is upload to a S3 bucket

- [index.py](src/index.py): Submit a batch job when a task file is uploaded to a S3 bucket

## Task File
* example: [task.json](examples/task.json)

| Name | Description |
|----------|-------------|
| jobName | Batch job name |
| jobQueue | Batch job queue name |
| jobDefinition | Batch job definition name |
| command | "./createiiif.sh" |
| AWS_REGION | AWS region, e.g. us-east-1 |
| SRC_BUCKET | S3 bucket which stores the images need to be processed. (Source S3 bucket) |
| AWS_BUCKET_NAME | S3 bucket which stores the generated tile images and manifests files. (Target S3 bucket) |
| ACCESS_DIR | Path to the image folder under the SRC_BUCKET |
| CSV_NAME | A CSV file with title and description of the images |
| CSV_PATH | Path to the csv folder under the SRC_BUCKET |
| DEST_BUCKET | Folder to store the generated tile images and manifests files inside `AWS_BUCKET_NAME` |
| DEST_URL | Root URL for accessing the manifests e.g. https://s3.amazonaws.com/AWS_BUCKET_NAME |
| UPLOAD_BOOL | Default is `false`. Set it to `true` if you want to use upload_to_s3 [iiifS3](https://github.com/cmoa/iiif_s3) feature and use your customized docker image. |
- example: [task.json](examples/task.json)

| Name | Description |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| jobName | Batch job name |
| jobQueue | Batch job queue name |
| jobDefinition | Batch job definition name |
| command | "./createiiif.sh" |
| AWS_REGION | AWS region, e.g. us-east-1 |
| COLLECTION_IDENTIFIER | from collection metadata csv |
| AWS_SRC_BUCKET | S3 bucket which stores the images need to be processed. (Source S3 bucket) |
| AWS_DEST_BUCKET | S3 bucket which stores the generated tile images and manifests files. (Target S3 bucket) |
| ACCESS_DIR | Path to the image folder in `AWS_SRC_BUCKET` |
| DEST_PREFIX | path pointing to the directory that contains your collection directory in `AWS_DEST_BUCKET` (This is generally your "collection category" and does not include COLLECTION_IDENTIFIER at the path's end. ) |
| DEST_URL | Root URL for accessing the manifests e.g. https://cloudfront.amazonaws.com/... |
| CSV_NAME | A CSV file with title and description of the images |
| CSV_PATH | Path to the csv folder under the `AWS_SRC_BUCKET` |

## IIIF S3 Docker image
* [iiif_s3_docker](https://github.com/vt-digital-libraries-platform/iiif_s3_docker)
* Image at Docker Hub: [yinlinchen/vtl:iiifs3_v3](https://cloud.docker.com/repository/docker/yinlinchen/vtl/tags)

- [iiif_s3_docker](https://github.com/vt-digital-libraries-platform/iiif_s3_docker)
- Image at Docker Hub: [wlhunter/iiif_s3_tiling:0.0.1](https://hub.docker.com/repository/docker/wlhunter/iiif_s3_tiling/)
10 changes: 5 additions & 5 deletions awsiiifs3batch.template
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Description: Orchestrating an Application Process with AWS Batch using CloudForm
Parameters:
DockerImage:
Description: Docker image or a repository from a registry
Default: yinlinchen/vtl:iiifs3_v3
Default: wlhunter/iiif_s3_tiling:latest
Type: String
JDName:
Description: Job definition name
Expand All @@ -27,7 +27,7 @@ Parameters:
Type: String
S3BucketName:
Description: S3 bucket name
Default: batch-processing-job
Default: vtdlp-tiling-batch-deposit
Type: String
Resources:
VPC:
Expand Down Expand Up @@ -203,16 +203,16 @@ Resources:
Description: Python Function Handler that would be triggered BY s3 events TO
the aws batch
Handler: index.lambda_handler
Runtime: python3.6
Runtime: python3.9
MemorySize: 128
Timeout: 30
Role:
Fn::GetAtt:
- LambdaExecutionRole
- Arn
Code:
S3Bucket: iiif-code
S3Key: 1be832cac16b3fb8317111059c1172ed
S3Bucket: vtlib-cf-template
S3Key: dev/lambda-scripts/aws-batch-iiif-generator/20240118/e8d3b3d262ee42e5b3986d95645528ca
BatchProcessRepository:
Type: AWS::ECR::Repository
Properties:
Expand Down
Binary file not shown.
Binary file not shown.
Binary file modified examples/sample_dataset.zip
Binary file not shown.
68 changes: 25 additions & 43 deletions examples/task.json
Original file line number Diff line number Diff line change
@@ -1,44 +1,26 @@
{
"jobName":"job1",
"jobQueue":"IIIFS3JobQueue",
"jobDefinition":"IIIFS3JobDefinition:1",
"command":"./createiiif.sh",
"environment":[
{
"name":"UPLOAD_BOOL",
"value":"false"
},
{
"name":"AWS_REGION",
"value":"us-east-1"
},
{
"name":"SRC_BUCKET",
"value":"iawa-sample-data"
},
{
"name":"AWS_BUCKET_NAME",
"value":"iawa-target-data"
},
{
"name":"CSV_NAME",
"value":"Ms2016_012_Box2.csv"
},
{
"name":"ACCESS_DIR",
"value":"Women_of_Design/Ms2016_012_Womens_Development_Corp/Box2/Box2_Folder19_Synopsis/Access/"
},
{
"name":"CSV_PATH",
"value":"Women_of_Design/Ms2016_012_Womens_Development_Corp/CSV_to_upload/CSV_spreadsheets/"
},
{
"name":"DEST_BUCKET",
"value":"iiifs3"
},
{
"name":"DEST_URL",
"value":"https://s3.amazonaws.com/iawa-target-data"
}
]
}
"jobName": "job1",
"jobQueue": "IIIFS3JobQueue",
"jobDefinition": "IIIFS3JobDefinition",
"command": "./createiiif.sh",
"environment": [
{ "name": "AWS_REGION", "value": "us-east-1" },
{
"name": "COLLECTION_IDENTIFIER",
"value": "Ms2016_012_Womens_Development_Corp_Box2"
},
{
"name": "ACCESS_DIR",
"value": "Women_of_Design/Ms2016_012_Womens_Development_Corp_Box2/Box2_Folder19_Synopsis/Access"
},
{ "name": "AWS_SRC_BUCKET", "value": "ingest-dev-vtlib-store" },
{ "name": "AWS_DEST_BUCKET", "value": "ingest-dev.img.cloud.lib.vt.edu" },
{ "name": "DEST_PREFIX", "value": "iawa/Women_of_Design" },
{ "name": "DEST_URL", "value": "https://d21nnzi4oh5qvs.cloudfront.net" },
{
"name": "CSV_PATH",
"value": "Women_of_Design/Ms2016_012_Womens_Development_Corp_Box2/CSV_to_upload/CSV_spreadsheets"
},
{ "name": "CSV_NAME", "value": "Ms2016_012_Box2.csv" }
]
}
Loading

0 comments on commit 9b4b489

Please sign in to comment.