The World Cereal Processing system is built on top of Kubernetes and uses well-known and tested open-source components.
- Deployed and tested on Kubernetes v1.22+,
- The cluster is split (logically) into 2 node groups (one dedicated for system components and one for the processing) so that each has its own resources ; the split is handled using node labels,
- Components are deployed using helm (Kubernetes package manager) and namespaces to isolate them,
- By default, every component is configured to be highly available and fault tolerant (compatible and tested with AWS spot instances for processing nodes)
- Cert Manager to handle TLS certificates generation and renewal (Let's Encrypt)
- PostgreSQL HA as the shared database backend for components that require one (kong, keycloak, etc.)
- Kong as the gateway from outside of the cluster (routes requests to components)
- Keycloak as the authentication/authorization manager ; provides a Single Sign-on to all applications
- Logging stack:
- System/kubernetes logs are fetched using FluentBit and application logs are handled in python
- All logs are sent to Graylog which acts as the syslog server / viewer
- Graylog uses MongDB to store its configuration and Elasticsearch to store the logs it receives ; all logs in ES are sharded across all system nodes for availability and replicated to ensure availability in case of a node failure
- Graylog's configuration is automated using a Kubernetes job running a small python script using Graylog's REST API (more details about the configuration can be found in the scripts's header)
- Monitoring stack: Prometheus/Grafana, Thanos
The cluster is composed of 10 specific namespaces.
- argo: used by preprocessing modules.
- cert-manager: handle SSL/TLS certificates.
- sysdb: namespace for shared postgresql-ha instance.
- keycloak: SSO handle cluster authentication.
- kong: Ingress controller used with OIDC to handle user authentication.
- logging: stack that handle and store the cluster events.
- monitoring: stack for perfomance monitoring.
- rdm: stack regrouping elements related to the Reference Data Module.
- vdm: stack regrouping elements related to the Visualization and Dissemination Module.
- wctiler: stack that allows to check tiles processing state.
- A Kubernetes cluster running on version 1.22 or later,
- At least 3 nodes dedicated to the System Node Group with label
system-tag=world-cereal-system
(kubectl label node NODE_NAME system-tag=world-cereal-system
), - Access to a registry to store Docker images (public or with credentials).
- a "controller" system (separate computer, dedicated node, etc.) with
kubectl
andhelm
installed.git
also needs to be available to clone the repository in order to perform the intallation.
If using AWS infrastructure, the AWS CLI must be installed as well in order to handle the autoscaling and optionally, the registry credentials commands (automatically handled in the init phase). Will also be usefull to access any AWS S3 buckets.
- Configure you Github access to the EWoC project and then clone the EWoC System repository:
git clone https://github.com/WorldCereal/ewoc_platform && cd ewoc_platform
- Review all the necessary configuration files in order to customize your installation:
- the file
export-env.sh
located at the root of the project: it contains the base hostname for the project, all of the chart versions to use, and generates some secrets/passwords with random values - All the
values.yml
andvalues.tmpl
files in their respective folders undercharts/
,
- the file
- Source the file:
source export-env.sh
- Initialize the cluster with
make init
; this will update all the helm repositories and run thesys-init.sh
script which creates the namespaces, the registry secrets, the database password secret, and set up various required components. - Then, start the installation with
make deploy
. The script will install each component and wait for them to be deployed before going to next one.
You can also install them one by one separately withmake <app>
, (<app>
one of:certmgr
,pgsql
,kong
,keycloak
,thanos
,monitoring
,mongo
,elasticsearch
,graylog
,config
,fluentbit
,rdm
,vdm
,wctiler
).
Beware that some components have dependencies between each other.
-
The cluster size evolve according preproccesing needs, so it has two parts, one fix and one dynamic. All the elements deployed here, must be attached to the fix part which is labeled with system-tag: world-cereal-system. We strongly advise you to check that every component that you deploy use the nodeSelector to restrict pods execution only on the fix part of your cluster.
-
Some helm chart used here can be found online (bitnami for instance), however some use containers images that are only present in our private registry.
-
Be careful when removing a component not to delete the linked pvc.
Cert-Manager is the component that handle the SSL/TLS certificates for cluster applications.
It use the official helm chart from https://cert-manager.io/docs/installation/helm/.
It is deployed in his own namespace cert-manager
.
Please be aware to change the issuer email in export-env.sh through CERT_MANAGER_MAIL variable
To deploy it:
make certmgr
The data of keycloak, kong, grafana are stored in postgresql-HA.
All the password have been generated in sys-init.sh
step.
To deploy it:
make pgsql
Kong is the ingress controller of the plateform, it uses the postgresql-ha in order to store configuration data. An OIDC plugin is added to manage user authentication with the help of the keycloak SSO.
To deploy it:
make kong
Kong also provide a CRD that allows to configure the OIDC plugin though yaml file. For instance:
apiVersion: configuration.konghq.com/v1
config:
bearer_only: "no"
client_id: rdm
client_secret: TheClientSecret
discovery: https://YourKeycloakURL/auth/realms/YourRealm/.well-known/openid-configuration
introspection_endpoint: https://YourKeycloakURL/auth/realms/YourRealm/protocol/openid-connect/token/introspect
logout_path: /logout
realm: YourRealm
redirect_after_logout_uri: /
redirect_uri_path: null
response_type: code
scope: openid
session_secret: null
ssl_verify: "no"
token_endpoint_auth_method: client_secret_post
kind: KongPlugin
metadata:
name: oidc-plugin
namespace: yourNamespace
plugin: oidc
Keycloak is the SSO of the plateform that handle user authentication.
It uses the postgresql-ha in order to store configuration data.
Keycloak realm worldcereal
is pre-initialized during the installation.
To deploy it:
make keycloak
Once Keycloak is up and running, get the admin password then, connect to the web interface.
For each client of worldcereal realm, generate a client secret and add it to export-env.sh
in _CS
suffixed variables.
Finnally, source the file by running:
source export-env.sh
The kube-prometheus stack from community is used to monitor the plateform. It deploys Prometheus, Grafana and Alert manager. Grafana relies on postgresql-ha to store data regarding Oauth session users.
To deploy it:
make monitoring
For infomation:
- alertmanager-kube-prometheus-stack-alertmanager-0 pod allow to setup and check the prometheus rules.
- kube-prometheus-stack-grafana is the UI endpoint for the monitoring
- kube-prometheus-stack-prometheus-node-exporter-* is a DaemonSet(deploy a instance on every nodes) that allow to expose metrics to Prometheus.
- prometheus-kube-prometheus-stack-prometheus-0 this is the heart of prometheus, it fetches the metrics exposed by node-exporter.
Check the logging-stack.md
file.
WcTiler allow users to check the tiles processing status. WcTiler is a helm chart that deploy 2 elements, it is required to be plug to one database that has specific tables pattern to be read of. The database host and name needs to be set in export-en.sh
To install it run :
cd wctiler
Create namespace.
kubectl create ns wctiler
then
make deploy
The application should be accessible at the url wctiler.YOURDOMAIN.
Some change have been made on the WCtiler container image mapproxy because of the HTTP request size
that can be blocked if they execess a buffersize. That lead to not display the tiles.
To fix that, the docker image have been updated. By changing the uwsgi.ini paramter buffersize,
the problem is fixed.
If the issue persist, log in the container mapproxy and play with uswgi.ini parameters and then update server conf with uwsgi --reload /tmp/map.pid uwsgi.ini
.
If your solution fix the issue then update the dockerfile and push you new container image on the habor.
Check the RDM repository
Check the VDM repository
In case of plateform rebuild, it could be needed to reuse some of the PV as they contains valuable data. It concerns particularly VDM, RDM and Keycloak volumes.
If the plateform is rebuilt and ETCD is lost, all the PV are not available anymore in the new K8S context. This is how reattach the volume and reassociate the to their pods (openstack environment).
For simplicity, it's advised to rename your volume in your openstack project in order to easily identify the volume owner.
First gather the volume id in your openstack context. then create manually a PV using the following exemple:
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: "Name"
spec:
capacity:
storage: "20Gi" #Reuse old size
accessModes:
- "ReadWriteOnce"
claimRef:
namespace: theNS
name: theNameUsedByPVC
cinder:
volumeID: "Your Openstack volume id"
The important part is the claimref part that allows to force the binding between the claim and this manually create PV.
When the PVC is create, if the PVC name has not changed it should bind automatically to the PV and by so using the old volume.