Skip to content

Latest commit

 

History

History
157 lines (101 loc) · 4.59 KB

README.md

File metadata and controls

157 lines (101 loc) · 4.59 KB

Data Orchestration with Kestra

This repository will:

  1. Help you get started with Kestra
  2. Provide examples of how to use Kestra
  3. Provide examples of how to integrate Kestra with Infrastructure as Code tools (Terraform, GitHub Actions), Modern Data Stack products, and public cloud provider services
  4. Share best practices for managing data workflows across environments so that moving from development to production is as easy as possible without sacrificing security or reliability

Video tutorials


How to install Kestra

Download the Docker Compose file:

curl -o docker-compose.yml https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml

Start Kestra:

docker-compose up

install.png


Hello-World example

Here is a simple example logging hello world message to the terminal:

id: hello  
namespace: prod
tasks:
  - id: hello-world
    type: io.kestra.plugin.core.log.Log
    message: Hello world!

Adding a schedule

Here is how you can add a schedule trigger to run the flow every minute: helloParametrizedScheduled.yml

To add multiple schedules, each running with different input parameter values, use helloParametrizedMultipleSchedules.yml


How to integrate Kestra with other tools in the Modern Data Stack

Airbyte

Here is an example of using Kestra with Airbyte running in other Docker container: airbyteSync.yml

And example running multiple Airbyte syncs in parallel: airbyteSyncParallel.yml

Fivetran

Here is an example of a flow with a single task triggering a Fivetran sync: fivetranSync.yml


Custom Scripts

How to run Bash and Python tasks

Here is an example of a Bash task: csvKit

Custom Docker image per task

If you prefer to run the Python or Bash task in a (potentially custom) Docker container: pythonScriptContainer

Using dockerOptions with the dockerConfig attribute, you can also configure credentials to private Docker registries:

auths: { "my.registry.com" : { auth: "token" } }

Docker image with requirements.txt built at runtime

id: hello_python_docker
namespace: company.team
tasks:
  - id: python_container
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install requests pandas
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: python:3.11-slim
    script: |
      import pandas as pd
      import requests
      
      print(pd.__version__)
      print(requests.__version__)

Keyboard shortcuts for the built-in editor in the UI

On Mac:

  • fn + control + Space = Autocomplete
  • command + K + C = Comment
  • command + K + U = Uncomment
  • opt + up/down = Move line up/down

On Windows:

  • ctrl + space = Autocomplete
  • ctrl + K + C = Comment
  • ctrl + K + U = Uncomment
  • alt + up/down = Move line up/down

Credentials management

Open-source Kestra

In the open-source version, you can leverage Secrets.

Create an .env file and add any environment variables there with the prefix SECRET_, as shown in the .env_example file. The values are the passwords in the base64 encoded format.

Then, you can reference the secrets in your flow using {{ secret('AIRBYTE_PASSWORD') }}. Note that we drop the SECRET_ prefix while referencing the secrets.

For security reasons, environment variables are fixed at the application startup (JVM startup).

Also, make sure that your Kestra container contains this configuration in your docker-compose.yml file:

  kestra:
    image: kestra/kestra:develop-full
    ...
    env_file:
      - .env
    environment:
      KESTRA_CONFIGURATION: |
        kestra:
          ...

Cloud & Eneterprise Edition

Cloud & Enterprise Editions have a dedicated credentials managers with extra encryption, namespace-bound credential inheritance hierarchy and an RBAC-setting behind it.

You can add a Secret in the relevant namespace from the Namespace tab in the UI. You should not add the SECRET_ prefix while setting up the secrets via the Namespace tab.

To reference that secret in your flow, use {{ secret('AWS_ACCESS_KEY_ID') }}.