-
Notifications
You must be signed in to change notification settings - Fork 59
Installation and deployment
- A running kubernetes cluster setup with Persistent Volumes. We advise running 1 Cassandra pod per k8 node, and hence please plan your environment accordingly.
Note: See AKS notes to properly define parameters for clusters running on AKS.
Note: See Dynamic provisioning for notes to setup the persistent volumes properly for dynamic provisioning.
Note: See Local setup for notes on setting up a local kubernetes cluster
-
Deploy the CRDs used by the operator to manage Cassandra clusters:
kubectl apply -f deploy/crds.yaml
-
Deploy the operator:
kubectl apply -f deploy/bundle.yaml
-
Verify the operator is running:
kubectl get pods | grep cassandra-operator
cassandra-operator-5755f6855f-t9hvm 1/1 Running 0 65s
It is possible to configure Cassandra by providing custom configuration. Refer to Custom configurations for options to configure your cluster.
NOTE: To deploy the Cassandra cluster, one can use an example yaml provided in the
examples
directory. There are 2 examples included in the repo:
- example-datacenter.yaml -> a full example will all the fields showing usage. Use it as a template for your usecase.
- example-datacenter-minimal.yaml -> the minimal example of the yaml. To use this example, you must also create a configMap called "cassandra-operator-default-config" that will have default values used by operator set:
apiVersion: v1 kind: ConfigMap metadata: name: cassandra-operator-default-config data: nodes: "3" cassandraImage: gcr.io/cassandra-operator/cassandra-3.11.6:latest sidecarImage: gcr.io/cassandra-operator/cassandra-sidecar:latest memory: 1Gi disk: 1Gi diskMedium: MemoryThis configMap is already loaded into your k8 environment if you've used
deploy/bundle.yaml
to load operator's configuration.
-
Make sure to set all apropriate values and fields in the yaml
examples/example-datacenter.yaml
-
Deploy the cluster
# kubectl apply -f examples/example-datacenter.yaml
-
Wait for the pods to become ready:
NOTE: It could take a few minutes for the pods to converge while persistent volumes are being automatically provisioned and attached to the cluster nodes.
kubectl get pods | grep cassandra-test
NAME READY STATUS RESTARTS AGE cassandra-test-dc-cassandra-west1-a-0 2/2 Running 2 84m cassandra-test-dc-cassandra-west1-b-0 2/2 Running 0 83m cassandra-test-dc-cassandra-wesr1-c-0 2/2 Running 0 81m
-
Verify the Cassandra cluster is healthy:
kubectl exec cassandra-test-dc-cassandra-west1-a-0 -c cassandra -- nodetool status
Datacenter: test-dc-cassandra ========================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.244.1.14 87.41 KiB 256 62.9% dcf940c2-18d2-4a3a-8abf-833acadeca7e west1-a UN 10.244.2.9 87.38 KiB 256 69.6% fd59fa32-aab0-485e-b04b-7ad4b75e54dd west1-b UN 10.244.0.10 69.91 KiB 256 67.5% 9e4883a1-e822-472f-920f-f2fc36c340c8 west1-c
-
Issue a sample query to the cluster:
kubectl exec cassandra-test-dc-cassandra-west1-a-0 -c cassandra -- cqlsh -e "SELECT now() FROM system.local;" cassandra-test-dc-cassandra-nodes
system.now() -------------------------------------- 243e2fd0-d64a-11e9-b8a4-2dd801fa1b1c (1 rows)
-
Create a Cassandra cluster, if you haven't already:
$ kubectl apply -f example/example-datacenter.yaml
-
In
example/example-datacenter.yaml
the initial cluster size is 3. Modify the file and changereplicas
from 3 to 5.spec: replicas: 5 image: "gcr.io/cassandra-operator/cassandra-3.11.6:latest"
-
Apply the size change to the cluster CR:
$ kubectl apply -f example/example-datacenter.yaml
The Cassandra cluster will scale to 5 members (5 pods):
$ kubectl get pods NAME READY STATUS RESTARTS AGE cassandra-test-dc-cassandra-west1-a-0 2/2 Running 1 10m12s cassandra-test-dc-cassandra-west1-a-1 2/2 Running 2 3m2s cassandra-test-dc-cassandra-west1-b-0 2/2 Running 1 8m38s cassandra-test-dc-cassandra-west1-b-1 2/2 Running 1 1m4s cassandra-test-dc-cassandra-west1-c-0 2/2 Running 0 5m22s
-
Similarly we can decrease the size of cluster from 5 back to 3 by changing the size field again and reapplying the change.
spec: replicas: 3 image: "gcr.io/cassandra-operator/cassandra:latest"
Then apply the changes
$ kubectl apply -f example/example-datacenter.yaml
Note: scaling up/down is a long operation that performs actions in the background that are not visible to kubernetes tools. It might take long time until some "activity" is seen (like nodes in
LEAVING
state or pods terminating). Please do not rerun this command many times, but just follow the sidecar log (viakubectl logs <pod name> --container sidecar
and cassandra status viakubectl exec <pod name> -c cassandra -- bash "nodetool status"
)
WARNING! The following will delete the Cassandra cluster deployed in the previous steps as well as all of its data.
-
Delete the Cassandra cluster:
kubectl delete -f examples/example-datacenter.yaml
-
Delete the PVCs created automatically for the pods:
kubectl delete pvc data-volume-cassandra-cassandra-test-{rack name}-{num}
-
Delete the operator:
kubectl delete -f deploy
-
Delete the RBAC and PSP resources:
kubectl delete -f deploy/cassandra
-
Delete the CRDs:
kubectl delete -f deploy/crds.yaml
If you do not specify DataVolumeClaimSpec
in your spec, it will automatically use EmptyDir
volume
from Kubernetes. You control that EmptyDir
via field called DummyVolume
in your spec.
If DummyVolume
is not specified either, it will take defauls from config map as described above.
DummyVolume
is of type EmptyDirVolumeSource
so you can specify there medium
and sizeLimit
.
medium
is by default empty string ""
which means it will use a directory on pod. However, you can
also use medium
as Memory
. This means that your whole Cassandra node basically runs in memory as
/var/lib/cassandra
where data are stored is actually memory mount.
Using this volume type means that your data in Cassandra will live only until that pod is restarted hence it might be handy for cases like performance testing or similar if you do not care about your data persistence.
Use Memory
medium with care as the sizeLimit
eats memory from your limits.
Lets see how a common scenario with scaling works. If you want to scale from 1 node to e.g. 2 nodes (just for the sake of the argument), there will be another PVC for the second pod which will be bound to respective PV. Upon scaling down, the latest pod is deleted but the persistence volume is not. It stays behind. Now if you want to scale back to 2 nodes again, it would reuse the same PVC but the data there would not make sense anymore. The second node was decommissioned and it is not meant to be the part of the cluster anymore. On such bootstrapping, Cassandra would complain that that node was decommissioned and it tries to re-join a cluster, which is illegal to do (under normal circumstances).
To overcome this situation, there is the possibility to delete PVCs after a pod
is deleted automatically. By default, this is turned off and one can turn it on by
flag deletePVCs
. If this flag is set to true
, upon pod's deletion, its PVC will be
automatically deleted and PV will be recycled (or retained, but does it make sense?).
Similarly, if the whole data center is deleted, all pods are terminated and all PVCs would be deleted too
if this option is active. This functionality is done via finalizers.
You can use e.g. LoadBalancer service in front and route it to a node.
Exposing would be done like:
kubectl expose pod cassandra-test-cluster-dc1-west1-a-0 \
--type="LoadBalancer" \
--name=node1-service \
--port=9042 \
--target-port=9042
So you can do just this:
[smiklosovic@E091-FED ~]$ cqlsh 52.226.147.210
Connected to cassandra-test at 52.226.147.210:9042.
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
This was tested against Azure.