knimer provides an automation mechanism for running KNIME workflows and can be used as an alternative to KNIME Server.
There are two parts to this project:
- Docker configuration (Dockerfile) to create an image which:
- Downloads a zipped KNIME workflow from an AWS S3 Bucket
- Runs that workflow in KNIME batch execution mode
- Handles custom workflow variables
- Handles custom workflow credentials
- A set of Terraform scripts which can be used as a module to create AWS
infrastructure which:
- Creates an ECS Task Definition (incl. Cloudwatch logging) to run the Docker image
- Optionally can use AWS EventBridge to schedule periodic running of the Task
- Optionally can use an AWS Lambda to send Slack notifications to show progress of the Task
KNIME is a powerful application for automating the manipulation of data. When workflows need to be run regularly though it's inconvenient to require a human to open up the KNIME Desktop application and execute them by hand.
Even using something like Windows Task Scheduler to run the workflow in batch mode still needs the computer to be on and is limited by the compute resources available on the machine.
KNIME Server offers scheduling but comes at a hefty price tag for both licenses and hosting fees. This project focuses on core needs of scheduling and notification, rather than any team collaboration features.
Firstly create a workflow as usual in KNIME.
If you need dynamic data passed into the workflow you can use Workflow
Variables by right-clicking on the workflow in the KNIME Explorer and
selecting Workflow Variables
.
If you need secret data passed into the workflow you can use
KNIME Credentials by adding a Credentials Configuration
node.
Zip up the workflow folder (the one containing the workflow.knime
file)
and upload it to an S3 bucket.
Follow the instructions here to get a Webhook URL (which should be treated as confidential).
You should then manually store this URL in an AWS Parameter.
The name and ARN of this parameter are then passed as two of the variables into the Terraform module.
It's possible to also trigger one-off runs of workflows via a Slack Slash command.
Create a new Slack App and from within
Settings
-> Basic Information
-> App Credentials
find the
Signing Secret.
You should then manually store this Signing Secret in an AWS Parameter. The name and ARN of this parameter are then passed as two of the variables into the Terraform module.
Add the following module to your terraform configuration and customise the variables:
module "knimer_slash" {
# Uses the terraform scripts directly from this repo (can also pin version with knimer.git?ref=<BLAH>)
source = "github.com/nick-solly/knimer.git//terraform/slack-trigger"
# Used for naming AWS resources
name_prefix = "prod"
aws_region = "eu-west-2"
slack_signing_secret_name = "secret"
slack_signing_secret_arn = "arn:aws:ssm:eu-west-2:111122223333:parameter/secret"
# Only allow this slash command to be run from certain channel (optional)
slack_channel_restriction = "tl_knimer"
# ID's of the subnets to run the ECS Task on
ecs_subnets = ["SN1", "SN2"]
# A map of the ECS Tasks which can be triggered via the slash command.
# Key is Task Definition ARN, Value is Cluster ARN.
knimer_ecs_tasks = {
"arn:aws:ecs:us-west-2:111122223333:task-definition:asf123-kcf38k" = "arn:aws:ecs:us-west-2:111122223333:tcluster:389djd-x93kd3"
}
}
Follow the instructions here to create the slash command for the new App.
One of the outputs of the terraform module is the endpoint URL you'll need
to set as the Request URL
.
Add the following module into your terraform configuration and customise the variables:
module "knimer" {
# Uses the terraform scripts directly from this repo (can also pin version with knimer.git?ref=<BLAH>)
source = "github.com/nick-solly/knimer.git//terraform/knimer"
aws_region = "eu-west-2"
# Used for naming AWS resources
name_prefix = "my-workflow"
cpu = 2048
memory = 16384
# In the S3 bucket, the file should be named `my_workflow.zip`
knime_workflow_file = "my_workflow"
s3_bucket_name = "all_the_workflows"
workflow_variables = {
variable1 = "ThisIsAValue,String",
variable2 = "1234,int",
}
workflow_secrets = {
database_creds = "username;password",
}
# Contains the Slack Webhook URL (optional)
slack_webhook_url_secret_name = "slack-webhook-url"
slack_webhook_url_secret_arn = "arn:aws:ssm:eu-west-2:111122223333:parameter/slack-webhook-url"
# Where the ECS Task should be run
subnet_ids = ["subnet-0af169a6f98a3hg34", "subnet-042b69da4001512ca"]
# When you want the workflow to be run
schedule_expressions = ["cron(0 4 * * ? *)"]
}
For advice on cpu
and memory
values see here.
For advice on Schedule Expressions see here.
This method does not include scheduling or Slack notifications as they are actioned via AWS services.
On a Docker host run:
docker run \
-e S3_BUCKET_NAME=my_workflow_bucket \
-e KNIME_WORKFLOW_FILE=workflow_file \
-e WORKFLOW_VARIABLES="-workflow.variable=variable_a,foo,String -workflow.variable=variable_b,6,int" \
-e WORKFLOW_SECRETS="-credential=database_creds;username;password -credential=sharepoint_creds;username;password" \
-e AWS_ACCESS_KEY_ID=ABCDALKNCLASASASC \
-e AWS_SECRET_ACCESS_KEY=A2309F23J02 \
ghcr.io/nick-solly/knimer/knimer:latest
Note:
KNIME_WORKFLOW_FILE
is without the.zip
extensionAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
are just for accessing the S3 bucket where the zipped workflow file is located
You can locally build the image using the included Makefile
.
- Allow customisable KNIME extensions to be installed
- Add a diagram of the AWS infrastructure this module creates
- Improve handling of KNIME product versioning as currently it's hardcoded
- Add GitHub Releases and versioning to the Docker images
- Add GitHub action to run terraform linting/checking
- Add a link to the Cloudwatch logs to the Slack message
- Consider how to move the docker image into AWS ECR to reduce ECS Task spin-up time