Skip to content

Installation and deployment

Štefan Miklošovič edited this page Jan 21, 2021 · 4 revisions

Requirements

  • A running kubernetes cluster setup with Persistent Volumes. We advise running 1 Cassandra pod per K8S node, and hence please plan your environment accordingly.

Note: See AKS notes to properly define parameters for clusters running on AKS.

Note: See Dynamic provisioning for notes to setup the persistent volumes properly for dynamic provisioning.

Note: See Local setup for notes on setting up a local Kubernetes cluster

Deploy the Operator

  1. Deploy the CRDs used by the operator to manage Cassandra clusters:

    kubectl apply -f deploy/crds.yaml
    
  2. Deploy the operator:

    kubectl apply -f deploy/bundle.yaml
    
  3. Verify the operator is running:

    kubectl get pods | grep cassandra-operator
    
    cassandra-operator-5755f6855f-t9hvm   1/1     Running   0          65s
    

Custom Cassandra configurations

It is possible to configure Cassandra by providing custom configuration. Refer to Custom configurations for options to configure your cluster.

Deploy a Cassandra cluster

NOTE: To deploy the Cassandra cluster, one can use an example yaml provided in the examples directory. There are 2 examples included in the repo:

  • example-datacenter.yaml -> a full example will all the fields showing usage. Use it as a template for your usecase.
  • example-datacenter-minimal.yaml -> the minimal example of the yaml. To use this example, you must also create a configMap called "cassandra-operator-default-config" that will have default values used by operator set:
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cassandra-operator-default-config
    data:
      nodes: "3"
      cassandraImage: gcr.io/cassandra-operator/cassandra-3.11.9:latest
      sidecarImage: gcr.io/cassandra-operator/instaclustr-icarus:latest
      memory: 1Gi
      disk: 1Gi
      diskMedium: Memory

This configMap is already loaded into your K8S environment if you have used deploy/bundle.yaml to load operator's configuration.

  1. Make sure to set all apropriate values and fields in the yaml examples/example-datacenter.yaml

  2. Deploy the cluster

    # kubectl apply -f examples/example-datacenter.yaml
  3. Wait for the pods to become ready:

    NOTE: It could take a few minutes for the pods to converge while persistent volumes are being automatically provisioned and attached to the cluster nodes.

    kubectl get pods | grep cassandra-test
    
    NAME                                          READY   STATUS             RESTARTS   AGE
    cassandra-test-dc-cassandra-west1-a-0   2/2     Running            2          84m
    cassandra-test-dc-cassandra-west1-b-0   2/2     Running            0          83m
    cassandra-test-dc-cassandra-wesr1-c-0   2/2     Running            0          81m
    
  4. Verify the Cassandra cluster is healthy:

    kubectl exec cassandra-test-dc-cassandra-west1-a-0 -c cassandra -- nodetool status
    
    Datacenter: test-dc-cassandra
    ==========================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  10.244.1.14  87.41 KiB  256          62.9%             dcf940c2-18d2-4a3a-8abf-833acadeca7e  west1-a
    UN  10.244.2.9   87.38 KiB  256          69.6%             fd59fa32-aab0-485e-b04b-7ad4b75e54dd  west1-b
    UN  10.244.0.10  69.91 KiB  256          67.5%             9e4883a1-e822-472f-920f-f2fc36c340c8  west1-c
    
  5. Issue a sample query to the cluster:

    kubectl exec cassandra-test-dc-cassandra-west1-a-0 -c cassandra -- cqlsh -e "SELECT now() FROM system.local;" cassandra-test-dc-cassandra-nodes 
    
     system.now()
    --------------------------------------
     243e2fd0-d64a-11e9-b8a4-2dd801fa1b1c
    
    (1 rows)
    

Resize a Cassandra cluster

  1. Create a Cassandra cluster, if you haven't already:

    $ kubectl apply -f example/example-datacenter.yaml
  2. In example/example-datacenter.yaml the initial cluster size is 3. Modify the file and change replicas from 3 to 5.

    spec:
      replicas: 5
      image: "gcr.io/cassandra-operator/cassandra-3.11.9:latest"
  3. Apply the size change to the cluster CR:

    $ kubectl apply -f example/example-datacenter.yaml

    The Cassandra cluster will scale to 5 members (5 pods):

    $ kubectl get pods
    NAME                                    READY     STATUS    RESTARTS   AGE
    cassandra-test-dc-cassandra-west1-a-0   2/2       Running   1          10m12s
    cassandra-test-dc-cassandra-west1-a-1   2/2       Running   2          3m2s
    cassandra-test-dc-cassandra-west1-b-0   2/2       Running   1          8m38s
    cassandra-test-dc-cassandra-west1-b-1   2/2       Running   1          1m4s
    cassandra-test-dc-cassandra-west1-c-0   2/2       Running   0          5m22s
  4. Similarly we can decrease the size of cluster from 5 back to 3 by changing the size field again and reapplying the change.

    spec:
      replicas: 3
      image: "gcr.io/cassandra-operator/cassandra-3.11.9:latest"

    Then apply the changes

    $ kubectl apply -f example/example-datacenter.yaml

    Note: scaling up/down is a long operation that performs actions in the background that are not visible to kubernetes tools. It might take long time until some "activity" is seen (like nodes in LEAVING state or pods terminating). Please do not rerun this command many times, but just follow the sidecar log (via kubectl logs <pod name> --container sidecar and cassandra status via kubectl exec <pod name> -c cassandra -- bash "nodetool status")

Cleanup

WARNING! The following will delete the Cassandra cluster deployed in the previous steps as well as all of its data.

Delete the Cassandra Cluster

  1. Delete the Cassandra cluster:

    kubectl delete -f examples/example-datacenter.yaml
    
  2. Delete the PVCs created automatically for the pods:

    kubectl delete pvc data-volume-cassandra-cassandra-test-{rack name}-{num}
    

Delete the Operator

  1. Delete the operator:

    kubectl delete -f deploy
    
  2. Delete the RBAC and PSP resources:

    kubectl delete -f deploy/cassandra
    
  3. Delete the CRDs:

    kubectl delete -f deploy/crds.yaml 
    

Using EmptyDir as volume for a pod

If you do not specify DataVolumeClaimSpec in your spec, it will automatically use EmptyDir volume from Kubernetes. You control that EmptyDir via field called DummyVolume in your spec.

If DummyVolume is not specified either, it will take defauls from config map as described above.

DummyVolume is of type EmptyDirVolumeSource so you can specify there medium and sizeLimit. medium is by default empty string "" which means it will use a directory on a pod. However, you can also use medium as Memory. This means that your whole Cassandra node basically runs in memory as /var/lib/cassandra where data are stored is actually memory mount.

Using this volume type means that your data in Cassandra will live only until that pod is restarted or stopped hence it might be handy for cases like testing or similar if you do not care about your data persistence.

Use Memory medium with care as the sizeLimit eats memory from your limits.

Deletion of persistence volume claims

Lets see how a common scenario with scaling works. If you want to scale from 1 node to e.g. 2 nodes (just for the sake of the argument), there will be another PVC for the second pod which will be bound to respective PV. Upon scaling down, the latest pod is deleted but the persistence volume is not. It stays behind. Now if you want to scale back to 2 nodes again, it would reuse the same PVC but the data there would not make sense anymore. The second node was decommissioned and it is not meant to be the part of the cluster anymore. On such bootstrapping, Cassandra would complain that that node was decommissioned and it tries to re-join a cluster, which is illegal to do (under normal circumstances).

To overcome this situation, there is the possibility to delete PVCs after a pod is deleted automatically. By default, this is turned off and one can turn it on by flag deletePVCs. If this flag is set to true, upon pod's deletion, its PVC will be automatically deleted and PV will be recycled (or retained, but does it make sense?). Similarly, if the whole data center is deleted, all pods are terminated and all PVCs would be deleted too if this option is active. This functionality is done via finalizers.

Access to cluster from outside

You can use e.g. LoadBalancer service in front and route it to a node.

Exposing would be done like:

kubectl expose pod cassandra-test-cluster-dc1-west1-a-0 \
  --type="LoadBalancer" \
  --name=node1-service \
  --port=9042 \
  --target-port=9042

So you can do just this:

[smiklosovic@E091-FED ~]$ cqlsh 52.226.147.210
Connected to cassandra-test at 52.226.147.210:9042.
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

This was tested against Azure.