Skip to content

Commit

Permalink
Add Controller HA reference material. Fixes #929
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Jan 31, 2025
1 parent e3c3110 commit bdab3f0
Show file tree
Hide file tree
Showing 9 changed files with 299 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/30-configuration/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
label: Configuration
position: 40
position: 15
link:
type: doc
id: reference/configuration/conventions
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
label: Reference
position: 40
position: 10
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/config-types/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Builtin Config Types
sidebar_position: 10
sidebar_position: 20
---

## Overview
Expand Down
Binary file not shown.
5 changes: 5 additions & 0 deletions docusaurus/docs/reference/ha/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
label: Controller HA
position: 22
link:
type: doc
id: reference/ha/overview
164 changes: 164 additions & 0 deletions docusaurus/docs/reference/ha/bootstrapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
sidebar_label: Bootstrapping
sidebar_position: 10
---

# Bootstrapping A Cluster

To bring up a controller cluster, one starts with a single node.

## Controller Configuration

### Certificates

Each controller requires appropriate certificates. The certificates for clustered controllers
have more requirements than those for a standalone server. See the [Certificates Reference](./certificates.md)
for more information.

### Config File

The controller requires a `raft` section.

```yaml
raft:
dataDir: /path/to/data/dir
```
The `dataDir` will be used to store the following:

* `ctrl-ha.db` - the ziti model bbolt database
* `raft.db` - the raft bbolt database
* `snapshots/` - a directory to store raft snapshots

Controller use the control channel listener to communicate with each other. Unlike
routers, they need to know how to reach each other, so an advertise address must
be configured.

```yaml
ctrl:
listener: tls:0.0.0.0:6262
options:
advertiseAddress: tls:192.168.1.100:6262
```

Finally, for sessions to work across controllers, JWTs are used. To enable these
an OIDC endpoint should be configured.

```yaml
web:
- name: all-apis-localhost
bindPoints:
- interface: 127.0.0.1:1280
address: 127.0.0.1:1280
options:
minTLSVersion: TLS1.2
maxTLSVersion: TLS1.3
apis:
- binding: health-checks
- binding: fabric
- binding: edge-management
- binding: edge-client
- binding: edge-oidc
```

## Initializing the Controller

Once properly configured, the controller can be started.

```shell
ziti controller run ctrl1.yml
```

Once the controller is up and running, it will see that it is not yet initialized, and will pause
startup, waiting for initialization. While waiting it will periodically emit a message:

```
[ 3.323] WARNING ziti/controller/server.(*Controller).checkEdgeInitialized: the
Ziti Edge has not been initialized, no default admin exists. Add this node to a
cluster using 'ziti agent cluster add tls:localhost:6262' against an existing
cluster member, or if this is the bootstrap node, run 'ziti agent controller init'
to configure the default admin and bootstrap the cluster
```

As this is the first node in the cluster, we can't add any nodes to it yet. Instead, run:

```
ziti agent controller init <admin username> <admin password> <admin identity name>
```

This initializes an admin user that can be used to manage the network.

## Managing the Cluster

There are four commands which can be used to manage the cluster.

```shell
# Adding Members
ziti agent cluster add <other controller raft address>
# Listing Members
ziti agent cluster list
# Removing Members
ziti agent cluster remove <controller id>
# Transfer Leadership
ziti agent cluster transfer-leadership [new leader id]
```

These are also available via the REST API, and can be invoked through the CLI.

```
$ ziti ops cluster --help
Controller cluster operations

Usage:
ziti ops cluster [flags]
ziti ops cluster [command]

Available Commands:
add-member add cluster member
list-members list cluster members and their status
remove-member remove cluster member
transfer-leadership transfer cluster leadership to another member

Flags:
-h, --help help for cluster

Use "ziti ops cluster [command] --help" for more information about a command.
```
## Growing the Cluster
Once a single node is up and running, additional nodes can be added to it. They should be
configured the same as the initial node, though they will have different addresses.
The first node, as configured above, is running at `192.168.1.100:6262`.
If the second node is running at `192.168.1.101:6262`, then it can be added to the
cluster in one of two ways.
### From An Existing Node
From a node already in the cluster, in this case our initial node, we can add the
new node as follows:
```
user@node1$ ziti agent cluster add tls:192.168.3.101
```
### From A New Node
We can also ask the new node, which is not yet part of the cluster, to reach
out to an existing cluster node and request to be joined.
```
user@node2$ ziti agent cluser add tls:192.168.3.100
```
## Shrinking the Cluster
From any node in the cluster, nodes can be removed as follows:
```
user@node1$ ziti agent cluster remove tls:192.168.3.101
```
86 changes: 86 additions & 0 deletions docusaurus/docs/reference/ha/certificates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
sidebar_label: Certificates
sidebar_position: 20
---

# Controller Certificates

For controllers to communicate and trust one another, they need certificates that have
been generated with the correct attribute and relationships.

## Requirements

1. The certificates must have a shared root of trust
2. The controller client and server certificates must contain a
[SPIFFE ID](https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/#spiffe-id)

## Steps to Certificate Creation
There are many ways to set up certificates, so this will just cover a recommended configuration.

The primary thing to ensure is that controllers have a shared root of trust.
A standard way of generating certs would be as follows:

1. Create a self-signed root CA
1. Create an intermediate signing cert for each controller
1. Create a server cert using the signing cert for each controller
1. Create a client cert using the signing cert for each controller
1. Make sure that the CA bundle for each server includes both the root CA and the intermediate CA
for that server

Note that controller server certs must contain a SPIFFE id of the form

```
spiffe://<trust domain>/controller/<controller id>
```

So if your trust domain is `example.com` and your controller id is `ctrl1`, then your SPIFFE id
would be:

```
spiffe://example.com/controller/ctrl1
```

**SPIFFE ID Notes:**

* This ID must be set as the only URI in the `X509v3 Subject Alternative Name` field in the
certificate.
* These IDs are used to allow the controllers to identify each during the mTLS negotiation.
* The OpenZiti CLI supports creating SPIFFE IDs in your certs
* Use the `--trust-domain` flag when creating CAs
* Use the `--spiffe-id` flag when creating server or client certificates

## Example

Using the Ziti PKI tool, certificates could be created as follows:

```bash
# Create the trust root, a self-signed CA
ziti pki create ca --trust-domain ha.test --pki-root ./pki --ca-file ca --ca-name 'HA Example Trust Root'

# Create the controller 1 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl1 --intermediate-name 'Controller One Signing Cert'

# Create the controller 1 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl1 --dns localhost --ip 127.0.0.1 --server-name ctrl1 --spiffe-id 'controller/ctrl1'

# Create the controller 1 server cert
ziti pki create client --pki-root ./pki --ca-name ctrl1 --client-name ctrl1 --spiffe-id 'controller/ctrl1'

# Create the controller 2 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl2 --intermediate-name 'Controller Two Signing Cert'

# Create the controller 2 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl2 --dns localhost --ip 127.0.0.1 --server-name ctrl2 --spiffe-id 'controller/ctrl2'

# Create the controller 2 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl2 --client-name ctrl2 --spiffe-id 'controller/ctrl2'

# Create the controller 3 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl3 --intermediate-name 'Controller Three Signing Cert'

# Create the controller 3 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl3 --dns localhost --ip 127.0.0.1 --server-name ctrl3 --spiffe-id 'controller/ctrl3'

# Create the controller 3 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl3 --client-name ctrl3 --spiffe-id 'controller/ctrl3'
```
40 changes: 40 additions & 0 deletions docusaurus/docs/reference/ha/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Controller HA

## Overview

Ziti controllers can be run in a cluster for high availablity and performance scaling.

### For SDK Clients/Tunnelers

A controller cluster offers the following advantages:

1. Horizontal scaling of SDK client services such as
1. Service lookups
1. Session creation
1. Horizontal scaling of circuit creation

This means that for everything that SDK clients and tunnelers depend on, controllers
can be scaled up and placed strategically to meet user demand.

The following limitations currently apply:

1. Circuits are owned by a controller. If the controller goes down, the circuit
will remain up, but can't be re-routed for performance or if a router goes down.
2. For a controller to route circuits on a router, that router must be connected
to that controller. This means that routers should generally be connected to
all controllers.

### For Network Operations

The HA controller cluster uses a distributed journal [Raft](https://raft.github.io/) to
keep the data model synchronized across controllers. This has the following ramifications:

1. Read operations will work on any controller that is up. If the controller is
disconnected from the cluster, the reads may return data that is out of date.
2. Update operations require that the cluster has a leader and that a quorum of nodes
is available. A quorum for a cluster of size N is (N/2)+1. This means that a 3 node
cluster can operate with 2 nodes and a 5 node cluster can operate with 3 nodes, and
so on.
3. Updates can be initiated on any controller, they will be forwarded to the leader to
be applied.
4. The cluster may have non-voting members.
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/tunnelers/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
label: Tunnelers
position: 10
position: 25
link:
type: doc
id: reference/tunnelers/index

0 comments on commit bdab3f0

Please sign in to comment.