Skip to content

Commit

Permalink
Merge pull request #1 from jtreutel/develop
Browse files Browse the repository at this point in the history
First Working Release
  • Loading branch information
jtreutel authored Oct 4, 2024
2 parents 9b0dde2 + c4b7f0f commit 7047497
Show file tree
Hide file tree
Showing 31 changed files with 10,580 additions and 9 deletions.
151 changes: 151 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
name: Push to feature branch

on:
push:
branches:
- 'master'
- 'develop'
- 'feature/**'


env:
#TF_LOG: INFO #debug only
TF_INPUT: false

jobs:
terraform-core:
environment: demogcp
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./terraform/core
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Authenticate to GCP
uses: 'google-github-actions/auth@v2'
with:
credentials_json: '${{ secrets.SERVICE_ACCOUNT_KEY }}'
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: "1.7.5"

- name: Terraform Init
id: init
run: terraform init

- name: Terraform Validate
id: validate
# Run even if formatting fails
if: success() || failure()
run: terraform validate

- name: Terraform Plan
id: plan
run: terraform plan

- name: Terraform Apply
id: apply
run: terraform apply --auto-approve
if: ${{ github.ref == 'refs/heads/master' }}

terraform-services:
environment: demogcp
runs-on: ubuntu-latest
needs: terraform-core
defaults:
run:
shell: bash
working-directory: ./terraform/services
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Authenticate to GCP
uses: 'google-github-actions/auth@v2'
with:
credentials_json: '${{ secrets.SERVICE_ACCOUNT_KEY }}'

# Needed for authenticating with GKE cluster
- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v2'

- name: 'Install gcloud GKE auth plugin'
run: gcloud components install gke-gcloud-auth-plugin

- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: "1.7.5"

- name: Terraform Init
id: init
run: terraform init

- name: Terraform Validate
id: validate
# Run even if formatting fails
if: success() || failure()
run: terraform validate

- name: Terraform Plan
id: plan
run: terraform plan


- name: Terraform Apply
id: apply
run: terraform apply --auto-approve
if: ${{ github.ref == 'refs/heads/master' }}


terraform-application:
environment: demogcp
runs-on: ubuntu-latest
needs: terraform-services
defaults:
run:
shell: bash
working-directory: ./terraform/application
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Authenticate to GCP
uses: 'google-github-actions/auth@v2'
with:
credentials_json: '${{ secrets.SERVICE_ACCOUNT_KEY }}'

# Needed for authenticating with GKE cluster
- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v2'

- name: 'Install gcloud GKE auth plugin'
run: gcloud components install gke-gcloud-auth-plugin

- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: "1.7.5"

- name: Terraform Init
id: init
run: terraform init

- name: Terraform Validate
id: validate
# Run even if formatting fails
if: success() || failure()
run: terraform validate

- name: Terraform Plan
id: plan
run: terraform plan


- name: Terraform Apply
id: apply
run: terraform apply --auto-approve
if: ${{ github.ref == 'refs/heads/master' }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#ignore testing files
**/*.test
#don't check in terraform dependencies
**/.terraform
119 changes: 110 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,114 @@
# k8s-demo-prod-infra
Kubernetes cluster infra

Terraform plan for deploying and configuring a GKE cluster, installing shared services, and deploying GCP infrastructure to support the production application ("demoapp").

Features
- Separation of concerns -- limit blast radius (core/services)
- Reusable code -- parameterized/generalized; can be used to provision additional environments

Assumptions
- Env types: Prod, QA, Dev -- represented by the environment labels `prd`, `qal`, and `dev` respectively
- State bucket `k8s-test-tfstate-u518zm` created outside of the TF plan
- `google_container_node_pool.node_config.oauth_scopes` would be appropiately narrowed to follow PoLP in actual prod; currently grants GCP SA access to all APIs
- TLS certs manually created via certbot CLI -- in prod, we'd automate their creation with certbot on k8s and GCP Cloud DNS01 challenge

## Features
- **Separation of Concerns**
- Limits "blast radius" by grouping infrastructure in a way that minimizes the number of infra components that must be "touched" when making changes
- **Reusability**
- The terraform plans are parameterized in such a way that they can easily be used to deploy additional environments (e.g. nonprod)
- **Secure and Transparent**
- Deployment is performed via a dedicated service account, access to which is tightly controlled via GCP IAM
- Deployments are only performed using this service account via GHA, making all changes to infrastructure visible and auditable via git history
- **Observability and Alerting**
- Basic logging has been configured at the cluster infra and k8s level (both for control plan and workloads)
- A proof-of-concept alert has been configured via Terraform to alert when cluster node CPU utilization crosses a threshold

## Structure & Usage

### Repo Structure

```
.
├── core GKE cluster and related GCP resources
├── services Shared Kubernetes services
└── application Production application GCP resources
```

Below is a description of what each plan does:

- **core**
- Sets up basic Kubernetes infrastructure in GCP
- Deploys the following:
- GCP VPC and subnets
- GKE cluster
- GKE nodepool (defined separately for easier management)
- GCP Service Account for cluster access (see [TODO](#todo) below)
- **services**
- Sets up shared services on the k8s cluster
- Deploys the following:
- ArgoCD
- Ingress Nginx controller
- kube-prometheus stack, including Prometheus and Grafana (not used, just an example of where this sort of service would go)
- Namespaces for the above applictions
- DNS records pointing to the nginx ingresses for the above applications
- **application**
- Sets up infrastructure to support the application.
- Deploys the following:
- Namespace for "demoapp" application
- DNS record pointing to the nginx ingress for the "demoapp" application


### Usage

The three plans described above are intended to be run in a specific order:

1. `core`
2. `services`
3. `application`

The GHA pipeline requires these plans to be run in this order (see Fig. 1 below).

The `core` plan statefile is referenced by a `terraform_remote_state` data source in both `services` and `application` in order to retrieve cluster information required for making changed to the cluster (see Fig. 2 below).


Figure 1:
![Figure 1](./docs/kdpi-gha-flow.png)

Figure 2:
![Figure 2](./docs/kdpi-tf-flow.png)



### Monitoring and Alerting

GCP Logging has been enabled for the GKE cluster infrastructure and the Kubernetes cluster, including both the control plane and the workloads. Logs can be viewed via the GCP console (see Fig. 3 below). A simple alert has been configured as a "proof-of-concept" to notify when cluster node CPU utilization exceeds >80% (see Fig. 4 below).

Figure 3:
[![Figure 3](./docs/gke_logging_t.png)](./docs/gke_logging.png)


Figure 4:
[![Figure 4](./docs/gke_alerting_t.png)](./docs/gke_alerting.png)



## Additional Context

### Assumptions
- Naming convention: Prod, QA, Dev are represented by the environment labels `prd`, `qal`, and `dev` respectively
- Existing infrastructure
- State bucket `k8s-test-tfstate-c74f3a`
- Service account `gha-access` for programmatic access from GHA
- Nginx Ingress TLS cert secrets manually created via certbot CLI
- GCP alert notification channel already exists

### TODO
- General
- Reconfigure networking so that internal services (e.g. Grafana) are only availabe on the private network
- CI/CD
- Configure GCP OIDC provider so that GHA does not have to store a GCP SA service key
- Configure GHA pipeline to treat this repo as a monorepo using a cascading Terraform apply:
- `core` modified: `core -> services -> application`
- `services` modified: `services -> application`
- `application` modified: `application`
- Configure GHA pipeline to allow Terraform to apply the `application` plan after changes are made to the application code repo
- Core
- Narrow `google_container_node_pool.node_config.oauth_scopes` in accordance with PoLP in prod; currently grants GCP SA access to all APIs
- Parameterize GKE cluster config for horizontal/vertical cluster scaling
- Services
- Automate creation and renewal of TLS certs with certbot (using DNS01 challenge on GCP Cloud DNS)
- Add additional IaC for ArgoCD configuration (project CRDs, etc.)
Binary file added docs/gke_alerting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gke_alerting_t.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gke_logging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gke_logging_t.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 7047497

Please sign in to comment.