Keep in mind that the only ansible variables actually used are the ones in the
generated_inventory_vars.yaml
file and you can directly edit this file, retrieve it from a vault, upload it to a vault etc, see the Credentials.mdYour node groups must be correctly configured, see Cluster configuration
-
Add nodes to a node group.
ansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t expand,update_hosts -e expand_nodegroup_name=NODEGROUP -e expand_count=N
-
Set up Reference System settings and security on the nodes
ansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t config --limit=CREATED_NODE1,CREATED_NODE2,... ansible-playbook security.yaml -i inventory/mycluster/hosts.yaml -b --limit=CREATED_NODE1,CREATED_NODE2,...
-
Add the nodes to the kubernetes cluster with kubespray
ansible-playbook collections/kubespray/cluster.yml -b -i inventory/mycluster/hosts.yaml --limit=CREATED_NODE1,CREATED_NODE2,...
-
Set up the providerID spec on the nodes for the autoscaler
ansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t providerids
If the created node has the affinities corresponding the the Ceph Cluster and a mounted volume, a new OSD will be created on this node, this is how you may scale the storage on the platform.
To downscale the
rook_ceph
cluster, first mark the OSDs corresponding to the node as out of the Ceph Cluster so the data is first removed from the OSDs.
-
Drain and delete the nodes from the kubernetes cluster
For each node, run:
ansible-playbook collections/kubespray/remove-node.yml -i inventory/mycluster/hosts.yaml -b -e skip_confirmation=yes -e reset_nodes=false -e node=NODE_TO_DELETE
-
Delete nodes from the cloud provider
ansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t shrink,update_hosts -e nodes_to_delete=NODE_TO_DELETE1,NODE_TO_DELETE2,...
The autoscaling on Reference System relies on the following components:
- the cluster-autoscaler for scaling decisions
- the rs-infra-autoscaler that implement the externalgrpc service for cluster management
- safescale for cloud provider management
These components interact in the following way:
The main settings for the cluster-autoscaler are set as command line parameters in the apps/autoscaling/cluster-autoscaler.yaml
file, you may want to tune some parameters that are described here.
Write your SafeScale tenants configuration with only the tenant corresponding to the cluster to scale in the apps/autoscaling/safescaled-tenants.yaml
-
Rewrite the cluster configuration
Following ./Cluster%20configuration.md, write the cluster configuration matching the running cluster. You can add some empty node groups to expand later.
-
Create the label corresponding to the node groups
safescale label create CLUSTER_NAME-nodegroup --value unset
-
Label the running hosts with the right node group label
For each node:
safescale host label bind NODE CLUSTER_NAME-nodegroup --value NODEGROUP
-
Update the
hosts.yaml
inventory fileansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t update_hosts
-
Set up the providerID spec on the nodes for the autoscaler
ansible-playbook cluster.yaml -i inventory/mycluster/hosts.yaml -t providerids
-
Deploy the autoscaling components
ansible-playbook apps.yaml \ -i inventory/mycluster/hosts.yaml \ -e app=autoscaling
Since release 0.9.0, the node group (and therefore the scaling) functionnality relies on the SafeScale label and no more on the SafeScale tags. Also, there has been a change in the metadata, so you must remove all tags from the 0.8.0 cluster, using the same SafeScale build used to write them.
You may use a script like the following to remove tags from the hosts for each node group:
for host in $(safescale tag inspect NODEGROUP | jq -r '.result.hosts[].name'); do safescale host untag $host NODEGROUP ; done
And then remove the actual node groups:
for tag in $(safescale tag list | jq -r '.result[].name'); do safescale tag delete $tag ; done
After that, follow the updated procedure above to deploy the node groups.
To prevent the cluster-autoscaler from deleting a given node, annotate it:
kubectl annotate node NODE_NAME cluster-autoscaler.kubernetes.io/scale-down-disabled=true
The cluster-autoscaler exposes metrics that are periodically retrieved by prometheus, you can use for example the grafana dashboard 3831 to visualize them.