Skip to content

Commit

Permalink
ci: add CI to deploy stac data pipeline to k8s (#3459)
Browse files Browse the repository at this point in the history
* feat: added stac data pipeline to be deployed to k8s

* feat: added ci to bump version
  • Loading branch information
JinIgarashi authored May 9, 2024
1 parent c4e0e1e commit a204b3e
Show file tree
Hide file tree
Showing 7 changed files with 220 additions and 0 deletions.
67 changes: 67 additions & 0 deletions .github/workflows/bump-stac-pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Bump geo-undpstac-pipeline version
on:
# This workflow will be triggered when the new release tag is created on geohub-data-pipeline repository.concurrency:
# https://github.com/UNDP-Data/geohub-data-pipeline/blob/main/.github/workflows/acr_docker_image.yml
repository_dispatch:
types: [bump-stacpipeline-version]
workflow_dispatch:

jobs:
bump-version:
runs-on: ubuntu-latest
env:
OWNER: undp-data
REPO: geo-undpstac-pipeline
steps:
- name: checkout
uses: actions/checkout@v4
with:
ref: develop

- name: get the latest version
id: pipeline
uses: pozetroninc/github-action-get-latest-release@master
with:
owner: ${{ env.OWNER }}
repo: ${{ env.REPO }}
excludes: prerelease, draft
token: ${{ secrets.GITHUB_TOKEN }}

- name: bump geo-undpstac-pipeline version
working-directory: backends/k8s/stac-pipeline/yaml
env:
PIPELINE_VERSION: ${{ steps.pipeline.outputs.release }}
YAML: deployment.yaml
run: |
echo "Latest release version: ${{ env.PIPELINE_VERSION}}"
imagename="undpgeohub.azurecr.io/${{env.OWNER}}/${{ env.REPO }}"
pattern="${imagename}:[^ ]*"
sed "s|$pattern|$imagename:${{ env.PIPELINE_VERSION}}|g" ${{ env.YAML}} > temp.yaml
# replace yaml file with new version
mv temp.yaml ${{ env.YAML}}
echo "tag version was replace to ${{ env.PIPELINE_VERSION}}"
- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
branch: release/bump-geo-undpstac-pipeline
title: "[RELEASE] bump version of geo-undpstac-pipeline"
delete-branch: true
commit-message: "[RELEASE] bump version of geo-undpstac-pipeline"
body: |
## Description
This is going to bump the version ofgeo-undpstac-pipeline to apply the new pipeline docker image to kubernetes cluster
---
- Auto-generated by [create-pull-request][1]
[1]: https://github.com/peter-evans/create-pull-request
labels: release
reviewers: |
iferencik
Thuhaa
JinIgarashi
33 changes: 33 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ jobs:
datapipeline: ${{ steps.changes.outputs.datapipeline }}
cogserver-dev: ${{ steps.changes.outputs.cogserver-dev }}
cogserver: ${{ steps.changes.outputs.cogserver }}
stacpipeline: ${{ steps.changes.outputs.stacpipeline }}
steps:
- uses: actions/checkout@v4

Expand Down Expand Up @@ -81,6 +82,9 @@ jobs:
- 'backends/k8s/cogserver/yaml/cogserver-deployment.yaml'
- '.github/workflows/bump-cogserver.yml'
- '.github/workflows/ci.yml'
stacpipeline:
- 'backends/k8s/stac-pipeline/yaml/deployment.yaml'
- '.github/workflows/ci.yml'
lint_build:
name: lint, build and test for GeoHub
Expand Down Expand Up @@ -513,3 +517,32 @@ jobs:
uses: actions-hub/kubectl@master
with:
args: apply -f backends/k8s/cogserver/yaml/cogserver-deployment.yaml

k8s_stacpipeline_deploy:
name: Deploy geo-undpstac-pipeline to Kubernetes
needs: changes
if: ${{ github.ref == 'refs/heads/develop' && needs.changes.outputs.stacpipeline == 'true' }}
runs-on: ubuntu-latest
env:
KUBE_CONFIG: ${{ secrets.KUBE_CONFIG }}
DEPLOYMENT_NAMESPACE: stac
AZURE_SERVICE_BUS_QUEUE_NAME: undp-stac-pipeline
environment:
name: K8S stac-pipeline
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Delete Secrets
uses: actions-hub/kubectl@master
with:
args: delete secret stac-secrets --ignore-not-found -n ${{ env.DEPLOYMENT_NAMESPACE }}
- name: Create Secrets
uses: actions-hub/kubectl@master
with:
args: create secret generic stac-secrets --from-literal=AZURE_STORAGE_CONNECTION_STRING=${{ secrets.AZURE_STORAGE_CONNECTION_STRING }} --from-literal=AZURE_SERVICE_BUS_CONNECTION_STRING=${{ secrets.AZURE_SERVICE_BUS_CONNECTION_STRING }} --from-literal=AZURE_SERVICE_BUS_QUEUE_NAME=${{ env.AZURE_SERVICE_BUS_QUEUE_NAME }} -n ${{ env.DEPLOYMENT_NAMESPACE }}
- name: Deploy ingest to kubernetes
uses: actions-hub/kubectl@master
env:
AZURE_SERVICE_BUS_CONNECTION_STRING: ${{ secrets.AZURE_SERVICE_BUS_CONNECTION_STRING }}
with:
args: apply -f backends/k8s/stac-pipeline/yaml/deployment.yaml
42 changes: 42 additions & 0 deletions backends/k8s/stac-pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# undp stac data pipeline

[geo-undpstac-pipeline](https://github.com/UNDP-Data/geo-undpstac-pipeline) is a command line tool to ingest datasets and convert it into STAC items. This pipeline is deployed into Azure Kubernetes Service by using ScaledJob which is triggered by Azure Service Bus Queue Event.

- [Namespace](#namespace)
- [Installation](#installation)
- [Uninstall](#uninstall)

## Namespace

The server lives in its namespace: **stac** and features 1 replicaset

## Installation

It requires to create a secret to store database connection string prior to apply `deployment.yaml` by kubectl command.

```shell
cd scripts
cp .env.example .env
# set environmental variables in .env
./install.sh
```

The above command will create the following environment

- namespace
- scaledjob

## Uninstall

To uninstall use the same yaml files i opposite order

```
cd scripts
./uninstall.sh
```

## Notes

For processing night time light data of 6 January 2024, it took around 12 minutes time with 20GB RAM allocated pod. `activeDeadlineSeconds` is set to 3600 seconds (1 hour). Thus, the container will automatically stop after 1 hour passes. If the job finished before 1 hour, the container will automatically stop and delete it.

This scaled job is deployed to `manual` node pool which can autoscale up to 2 nodes. Currently, all pods for titiler and titiler-dev can run within a node. When a message is added into the queue, the resource is not enough to launch scaled job. Then k8s will scale up to 2 nodes. Once the pipeline job finished, the second node will be deleted automatically after some time (probably around 15 - 20 minutes).
3 changes: 3 additions & 0 deletions backends/k8s/stac-pipeline/scripts/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
AZURE_STORAGE_CONNECTION_STRING=
AZURE_SERVICE_BUS_CONNECTION_STRING=
AZURE_SERVICE_BUS_QUEUE_NAME=undp-stac-pipeline
17 changes: 17 additions & 0 deletions backends/k8s/stac-pipeline/scripts/install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

NAMESPACE=stac
SECRET_NAME=stac-secrets

# Source the .env file located in the same directory as the script
. .env
# Rest of the script
kubectl apply -f ../yaml/deployment.yaml
# create secret with environmental variables
kubectl create secret generic $SECRET_NAME \
--from-literal=AZURE_STORAGE_CONNECTION_STRING=$AZURE_STORAGE_CONNECTION_STRING \
--from-literal=AZURE_SERVICE_BUS_CONNECTION_STRING=$AZURE_SERVICE_BUS_CONNECTION_STRING \
--from-literal=AZURE_SERVICE_BUS_QUEUE_NAME=$AZURE_SERVICE_BUS_QUEUE_NAME \
-n $NAMESPACE


3 changes: 3 additions & 0 deletions backends/k8s/stac-pipeline/scripts/uninstall.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
kubectl delete secret stac-secrets --ignore-not-found -n stac
kubectl delete -f ../yaml/deployment.yaml

55 changes: 55 additions & 0 deletions backends/k8s/stac-pipeline/yaml/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: stac
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: stac-scaledjob
namespace: stac
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 3600
backoffLimit: 5
template:
spec:
nodeSelector:
type: "manual"
containers:
- name: stac
image: undpgeohub.azurecr.io/undp-data/geo-undpstac-pipeline:v0.0.1
imagePullPolicy: Always
command: ["python3"]
args: ["-m", "undpstac_pipeline.cli", "queue"]
resources:
limits:
memory: "20G"
cpu: "2000m"
envFrom:
- secretRef:
name: stac-secrets
optional: false
restartPolicy: Never
pollingInterval: 30 # Optional. Default: 30 seconds
successfulJobsHistoryLimit: 0 # Optional. Default: 100. How many completed jobs should be kept.
failedJobsHistoryLimit: 0 # Optional. Default: 100. How many failed jobs should be kept.
envSourceContainerName: stac # Optional. Default: .spec.JobTargetRef.template.spec.containers[0]
minReplicaCount: 0 # Optional. Default: 0
maxReplicaCount: 1 # Optional. Default: 100
rollout:
strategy: gradual # Optional. Default: default. Which Rollout Strategy KEDA will use.
propagationPolicy: foreground # Optional. Default: background. Kubernetes propagation policy for cleaning up existing jobs during rollout.
scalingStrategy:
strategy: default
triggers:
- type: azure-servicebus
metadata:
queueName: undp-stac-pipeline
namespace: undpgeohub
messageCount: "1" # default 5, scale/spin a pod for every message
activationMessageCount: "0" # default 0, ensure no pods exist if no messages exist in the queue
connectionFromEnv: AZURE_SERVICE_BUS_CONNECTION_STRING

0 comments on commit a204b3e

Please sign in to comment.