generated from cybozu-go/neco-template
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Tomoki Sugiura <[email protected]>
- Loading branch information
1 parent
83c9571
commit f49f71e
Showing
1 changed file
with
168 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
# Design notes | ||
|
||
## Goals | ||
|
||
In environments where nodes and pods only have private IP addresses, communication to the external networks requires | ||
source network address translation (SNAT). | ||
If the underlying network provides SNAT, there is no problem. But if not, we need to somehow implement SNAT on | ||
Kubernetes. | ||
|
||
Pona provides a feature to implement SNAT routers on Kubernetes for these kind of environments. | ||
This feature is on-demand because admins can allow only a subset of pods to use SNAT routers. | ||
|
||
Coil has a similar feature, but it is tightly coupled to Coil's system and shares same life-cycle. | ||
Pona is designed to be a standalone feature that can be used independently of Coil. | ||
|
||
## Architecture | ||
|
||
```mermaid | ||
%%{init:{'theme': 'default'}}%% | ||
flowchart LR | ||
style Architecture fill: #FFFFFF | ||
subgraph Architecture | ||
subgraph k8s[Kubernetes Resources] | ||
egress[Egress]:::KubernetesResource | ||
pod[Pod]:::KubernetesResource | ||
svc[Service ClusterIP]:::KubernetesResource | ||
end | ||
classDef KubernetesResource fill: #FFFF00 | ||
subgraph ClientNode[Node] | ||
kubelet | ||
natClient[NAT Client] | ||
cniPlugin[Pona CNI Plugin] | ||
egressd[Egressd] | ||
end | ||
subgraph ServerNode[Node] | ||
natServer[NAT Pod] | ||
end | ||
controller[Egress Controller] | ||
external[External Host] | ||
controller -- a - 1 watch Egress resource --> egress | ||
controller -- a - 2 apply ClusterIP Service resource --> svc | ||
controller -- a - 2 apply NAT Pod resource --> natServer | ||
linkStyle 0,1,2 stroke: indianred; | ||
kubelet -- " b-1 create pod " --> natClient | ||
kubelet -- " b-2 send CNI Add " --> cniPlugin | ||
cniPlugin -- " b-3 send RPC call " --> egressd | ||
egressd -- " b-4 get Egress resource " --> egress | ||
egressd -- " b-4 get NAT Client Pod resource " --> pod | ||
egressd -- " b-5 configure NAT device and routing " --> natClient | ||
linkStyle 3,4,5,6,7,8 stroke: rebeccapurple; | ||
natClient --> svc | ||
svc --> natServer | ||
natServer --> external | ||
end | ||
``` | ||
|
||
### How Pona works | ||
|
||
Users can create a NAT Pod with a Egress Custom Resource. | ||
When a Egress resource is created, the Egress Controller creates a NAT Pod and a ClusterIP Service (`a-1`, `a-2`). | ||
|
||
When a user want to send a request with NAT, the user creates a NAT client Pod with an annotation. | ||
Egressd which is deployed as daemonset, watches Pod resources. | ||
When a Pod with the annotation is created, Egressd configures the Pod to be routed to NAT Pod via the ClusterIP Service. (`b-1`, `b-2`, `b-3`, `b-4`). | ||
|
||
A request from a NAT client Pod is routed to the NAT Pod via the ClusterIP Service, and the NAT Pod performs SNAT and sends the request to the external host. | ||
|
||
Pona uses FoU (Fou-over-UDP) to route packets between NAT client Pod and NAT Pod as. | ||
See [Coil's design docs](https://github.com/cybozu-go/coil/blob/main/docs/design.md#foo-over-udp-tunnel) for the reasons why FoU is adopted. | ||
|
||
|
||
### Components | ||
|
||
#### Egress Controller | ||
- Egress Controller watches Egress resources and creates NAT Pods and ClusterIP Services. | ||
|
||
#### NAT Pod | ||
- NAT Pod is a Pod that performs SNAT for NAT client Pods. | ||
- NAT Pod configures MASQUERADE in iptables and FoU device at start-up | ||
|
||
#### Pona CNI Plugin | ||
- Pona CNI Plugin is a CLI tool that is satified CNI spec interface. | ||
- Pona CNI Plugin delegates CNI calls to Egressd on the same node via an RPC call. | ||
- Pona CNI Plugin is designed to be used in CNI chains and does not have IPAM functionality. | ||
|
||
#### Egressd | ||
- Egressd is deployed as a daemonset pod and runs on each node. | ||
- Egressd configures network device and routing in the network namespace of a NAT client Pod via RPC calls from the Pona CNI Plugin. | ||
- Egressd also watches Pod and Egress resources and configures NAT client Pods when those resources are changed. | ||
|
||
### API | ||
|
||
#### Egress Custom Resource | ||
|
||
Egress resources have the following fields as well as Coil's Egress resources. | ||
|
||
| Field | Type | required | Description | | ||
|-------------------------|---------------------------|----------|----------------------------------------------------------------------| | ||
| `destinations` | `[]string` | true | IP subnets where the packets are SNATed and sent. | | ||
| `replicas` | `int` | false | Copied to Deployment's `spec.replicas`. Default is 1. | | ||
| `strategy` | [DeploymentStrategy][] | false | Copied to Deployment's `spec.strategy`. | | ||
| `template` | [PodTemplateSpec][] | false | Copied to Deployment's `spec.template`. | | ||
| `sessionAffinity` | `ClusterIP` or `None` | false | Copied to Service's `spec.sessionAffinity`. Default is `ClusterIP`. | | ||
| `sessionAffinityConfig` | [SessionAffinityConfig][] | false | Copied to Service's `spec.sessionAffinityConfig`. | | ||
| `podDisruptionBudget` | `EgressPDBSpec` | false | `minAvailable` and `maxUnavailable` are copied to PDB's spec. | | ||
|
||
Here is an example of Egress resource. | ||
|
||
```yaml | ||
apiVersion: pona.cybozu.com/v1 | ||
kind: Egress | ||
metadata: | ||
namespace: egress | ||
name: egress | ||
spec: | ||
destinations: | ||
- 172.20.0.0/16 | ||
- fd04::/64 | ||
replicas: 3 | ||
strategy: | ||
type: RollingUpdate | ||
rollingUpdate: | ||
maxUnavailable: 2 | ||
maxSurge: 0 | ||
template: | ||
metadata: | ||
annotations: | ||
ann1: foo | ||
labels: | ||
label1: bar | ||
spec: | ||
containers: | ||
- name: egress | ||
resources: | ||
limits: | ||
memory: 400Mi | ||
sessionAffinity: ClientIP | ||
sessionAffinityConfig: | ||
clientIP: | ||
timeoutSeconds: 43200 | ||
podDisruptionBudget: | ||
maxUnavailable: 1 | ||
``` | ||
#### Annotations | ||
To use NAT pod, users need to add an annotation to the Pod. | ||
Egress annotation's key is `egress.pona.cybozu.com/NAMESPACE` and its value is Egress resource name which you want to use. | ||
|
||
Here is an example of Pod with Egress annotation. | ||
|
||
```yaml | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: nat-client | ||
namespace: default | ||
annotations: | ||
egress.pona.cybozu.com/internet: egress | ||
spec: | ||
# ... | ||
``` | ||
|