Skip to content

Commit

Permalink
Add Controller HA reference material. Fixes #929
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Feb 3, 2025
1 parent a1558d6 commit 9f7fd11
Show file tree
Hide file tree
Showing 15 changed files with 742 additions and 7 deletions.
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/30-configuration/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
label: Configuration
position: 40
position: 15
link:
type: doc
id: reference/configuration/conventions
6 changes: 4 additions & 2 deletions docusaurus/docs/reference/30-configuration/controller.md
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ profile:

The raft section enables running multiple controllers in a cluster.

- `bootstrapMembers` - (optional) Only used when bootstrapping the cluster. List of initial clusters
- `initialMembers` - (optional) Only used when bootstrapping the cluster. List of initial clusters
members. Should only be set on one of the controllers in the cluster.
- `commandHandler` - (optional)
- `maxQueueSize` - (optional, 1000) max size of the queue for processing incoming raft log
Expand Down Expand Up @@ -510,10 +510,12 @@ The raft section enables running multiple controllers in a cluster.
be used to bring other nodes up to date that are only slightly behind, without having to send the
full snapshot. This is a cluster wide value and should be consistent across nodes in the cluster.
Otherwise the value from the most recently started controller will win.
- `warnWhenLeaderlessFor` - (optional, 1m) - Emits a warning log message if a controller is part of
a cluster with no leader for a duration which exceeds this threshold.

```text
raft:
bootstrapMembers:
initialMembers:
- tls:127.0.0.1:6262
- tls:127.0.0.1:6363
- tls:127.0.0.1:6464
Expand Down
6 changes: 5 additions & 1 deletion docusaurus/docs/reference/30-configuration/router.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ The `ctrl` section configures how the router will connect to the controller.
See [heartbeats](./conventions.md#heartbeats).
- `options` - a set of option which includes the below options and those defined
in [channel options](conventions.md#channel)
- `endpointsFile` - (optional, 'config file dir'/endpoints) - File location to save the current
known set of controller endpoints, when an endpoints update has been received from a controller.

Example:

Expand Down Expand Up @@ -164,6 +166,9 @@ Each dialer currently supports a number of [shared options](conventions.md#xgres
The `edge` section contains configuration that pertain to edge functionality. This section must be
present to enable edge functionality (e.g. listening for edge SDK connections, tunnel binding modes).

- `db` - (optional, `<path-to-config-file>.proto.gzip`) - Configures where the router data model will be snapshotted to
- `dbSaveIntervalSeconds` - (optional, 30s) - Configures how the router data model will be snapshotted

Example:

```text
Expand Down Expand Up @@ -210,7 +215,6 @@ routers at least one valid SAN must be provided.
- `uri` - (optional) - an array of URI SAN entries
- `email` - (optional) - an array of email SAN entries


### `forwarder`

The `forwarder` section controls options that affect how a router forwards payloads across links to
Expand Down
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
label: Reference
position: 40
position: 10
2 changes: 1 addition & 1 deletion docusaurus/docs/reference/config-types/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Builtin Config Types
sidebar_position: 10
sidebar_position: 20
---

## Overview
Expand Down
5 changes: 5 additions & 0 deletions docusaurus/docs/reference/ha/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
label: Controller HA
position: 22
link:
type: doc
id: reference/ha/overview
164 changes: 164 additions & 0 deletions docusaurus/docs/reference/ha/bootstrapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
sidebar_label: Bootstrapping
sidebar_position: 10
---

# Bootstrapping A Cluster

To bring up a controller cluster, one starts with a single node.

## Controller Configuration

### Certificates

Each controller requires appropriate certificates. The certificates for clustered controllers
have more requirements than those for a standalone server. See the [Certificates Reference](./certificates.md)
for more information.

### Config File

The controller requires a `raft` section.

```yaml
raft:
dataDir: /path/to/data/dir
```
The `dataDir` will be used to store the following:

* `ctrl-ha.db` - the OpenZiti data model bbolt database
* `raft.db` - the raft bbolt database
* `snapshots/` - a directory to store raft snapshots

Controller use the control channel listener to communicate with each other. Unlike
routers, they need to know how to reach each other, so an advertise address must
be configured.

```yaml
ctrl:
listener: tls:0.0.0.0:6262
options:
advertiseAddress: tls:192.168.1.100:6262
```

Finally, for sessions to work across controllers, JWTs are used. To enable these
an OIDC endpoint should be configured.

```yaml
web:
- name: all-apis-localhost
bindPoints:
- interface: 127.0.0.1:1280
address: 127.0.0.1:1280
options:
minTLSVersion: TLS1.2
maxTLSVersion: TLS1.3
apis:
- binding: health-checks
- binding: fabric
- binding: edge-management
- binding: edge-client
- binding: edge-oidc
```

## Initializing the Controller

Once properly configured, the controller can be started.

```shell
ziti controller run ctrl1.yml
```

Once the controller is up and running, it will see that it is not yet initialized, and will pause
startup, waiting for initialization. While waiting it will periodically emit a message:

```
[ 3.323] WARNING ziti/controller/server.(*Controller).checkEdgeInitialized: the
Ziti Edge has not been initialized, no default admin exists. Add this node to a
cluster using 'ziti agent cluster add tls:localhost:6262' against an existing
cluster member, or if this is the bootstrap node, run 'ziti agent controller init'
to configure the default admin and bootstrap the cluster
```

As this is the first node in the cluster, we can't add any nodes to it yet. Instead, run:

```
ziti agent controller init <admin username> <admin password> <admin identity name>
```

This initializes an admin user that can be used to manage the network.

## Managing the Cluster

There are four commands which can be used to manage the cluster.

```bash
# Adding Members
ziti agent cluster add <other controller raft address>
# Listing Members
ziti agent cluster list
# Removing Members
ziti agent cluster remove <controller id>
# Transfer Leadership
ziti agent cluster transfer-leadership [new leader id]
```

These are also available via the REST API, and can be invoked through the CLI.

```bash
$ ziti ops cluster --help
Controller cluster operations
Usage:
ziti ops cluster [flags]
ziti ops cluster [command]
Available Commands:
add-member add cluster member
list-members list cluster members and their status
remove-member remove cluster member
transfer-leadership transfer cluster leadership to another member
Flags:
-h, --help help for cluster
Use "ziti ops cluster [command] --help" for more information about a command.
```

## Growing the Cluster

Once a single node is up and running, additional nodes can be added to it. They should be
configured the same as the initial node, though they will have different addresses.

The first node, as configured above, is running at `192.168.1.100:6262`.

If the second node is running at `192.168.1.101:6262`, then it can be added to the
cluster in one of two ways.

### From An Existing Node

From a node already in the cluster, in this case our initial node, we can add the
new node as follows:

```bash
user@node1$ ziti agent cluster add tls:192.168.3.101
```

### From A New Node

We can also ask the new node, which is not yet part of the cluster, to reach
out to an existing cluster node and request to be joined.

```
user@node2$ ziti agent cluser add tls:192.168.3.100
```
## Shrinking the Cluster
From any node in the cluster, nodes can be removed as follows:
```
user@node1$ ziti agent cluster remove tls:192.168.3.101
```
86 changes: 86 additions & 0 deletions docusaurus/docs/reference/ha/certificates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
sidebar_label: Certificates
sidebar_position: 20
---

# Controller Certificates

For controllers to communicate and trust one another, they need certificates that have
been generated with the correct attribute and relationships.

## Requirements

1. The certificates must have a shared root of trust
2. The controller client and server certificates must contain a
[SPIFFE ID](https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/#spiffe-id)

## Steps to Certificate Creation
There are many ways to set up certificates, so this will just cover a recommended configuration.

The primary thing to ensure is that controllers have a shared root of trust.
A standard way of generating certs would be as follows:

1. Create a self-signed root CA
1. Create an intermediate signing cert for each controller
1. Create a server cert using the signing cert for each controller
1. Create a client cert using the signing cert for each controller
1. Make sure that the CA bundle for each server includes both the root CA and the intermediate CA
for that server

Note that controller server certs must contain a SPIFFE id of the form

```
spiffe://<trust domain>/controller/<controller id>
```

So if your trust domain is `example.com` and your controller id is `ctrl1`, then your SPIFFE id
would be:

```
spiffe://example.com/controller/ctrl1
```

**SPIFFE ID Notes:**

* This ID must be set as the only URI in the `X509v3 Subject Alternative Name` field in the
certificate.
* These IDs are used to allow the controllers to identify each during the mTLS negotiation.
* The OpenZiti CLI supports creating SPIFFE IDs in your certs
* Use the `--trust-domain` flag when creating CAs
* Use the `--spiffe-id` flag when creating server or client certificates

## Example

Using the OpenZiti PKI tool, certificates for a three node cluster could be created as follows:

```bash
# Create the trust root, a self-signed CA
ziti pki create ca --trust-domain ha.test --pki-root ./pki --ca-file ca --ca-name 'HA Example Trust Root'

# Create the controller 1 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl1 --intermediate-name 'Controller One Signing Cert'

# Create the controller 1 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl1 --dns localhost --ip 192.168.3.100 --server-name ctrl1 --spiffe-id 'controller/ctrl1'

# Create the controller 1 server cert
ziti pki create client --pki-root ./pki --ca-name ctrl1 --client-name ctrl1 --spiffe-id 'controller/ctrl1'

# Create the controller 2 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl2 --intermediate-name 'Controller Two Signing Cert'

# Create the controller 2 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl2 --dns localhost --ip 192.168.3.101 --server-name ctrl2 --spiffe-id 'controller/ctrl2'

# Create the controller 2 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl2 --client-name ctrl2 --spiffe-id 'controller/ctrl2'

# Create the controller 3 intermediate/signing cert
ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl3 --intermediate-name 'Controller Three Signing Cert'

# Create the controller 3 server cert
ziti pki create server --pki-root ./pki --ca-name ctrl3 --dns localhost --ip 192.168.3.102 --server-name ctrl3 --spiffe-id 'controller/ctrl3'

# Create the controller 3 client cert
ziti pki create client --pki-root ./pki --ca-name ctrl3 --client-name ctrl3 --spiffe-id 'controller/ctrl3'
```
Loading

0 comments on commit 9f7fd11

Please sign in to comment.