Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci): configure in cluster dagger engine #311

Merged
merged 3 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Issue Report
about: Create a report to help us improve
title: '[ISSUE] Brief Description of Issue'
title: 'Brief Description of Issue'
labels: bug
assignees: ''

Expand Down
6 changes: 2 additions & 4 deletions .github/ISSUE_TEMPLATE/enhancement.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Enhancement Suggestion
about: Suggest an idea for this project
title: '[ENHANCEMENT] Brief Description of Enhancement'
title: 'Brief Description of Enhancement'
labels: enhancement
assignees: ''

Expand All @@ -15,13 +15,11 @@ assignees: ''

### Motivation
*Explain why this enhancement would be useful to the project or users.*
*Describe the potential benefits of the enhancement, including possible impacts on performance, usability, and efficiency.*

### Detailed Explanation
*Provide a detailed explanation of the proposed enhancement. Include any preliminary ideas you have about the implementation, and how it integrates with existing functionalities.*

### Benefits
*Describe the potential benefits of the enhancement, including possible impacts on performance, usability, and efficiency.*

### Possible Drawbacks
*Consider any possible drawbacks or issues that might arise with the implementation of this enhancement.*

Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,21 @@ jobs:
with:
fetch-depth: 0

- name: Install Task
uses: arduino/setup-task@v1
with:
version: 3.x
repo-token: ${{ secrets.GITHUB_TOKEN }}

- name: Write required vault files
run: |
mkdir -p terraform/vault/cluster/.tls
echo 'keep' > terraform/vault/cluster/.tls/vault.pem
echo 'keep' > terraform/vault/cluster/.tls/vault-key.pem
echo 'keep' > terraform/vault/cluster/.tls/ca-chain.pem

- name: pre-commit checks
run: task pre-commit
- name: Validate Terraform Opentofu configuration
uses: dagger/dagger-for-github@v5
with:
version: "latest"
verb: call
module: github.com/Smana/daggerverse/pre-commit-tf@pre-commit-tf/v0.0.1
args: run --dir "." --tf-binary="tofu"
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}

kubernetes-validation:
name: Kubernetes validation ☸
Expand Down
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,11 @@ This repository provides a comprehensive guide and set of tools for building, ma
- [🔗 VPN connection using Tailscale](#-vpn-connection-using-tailscale)
- [🔑 Private PKI with Vault](#-private-pki-with-vault)
- [🧪 CI](#-ci)
- [🚧 Transition to Dagger](#-transition-to-dagger)
- [Overview](#overview)
- [Goal](#goal)
- [Overview](#overview)
- [🏠 Using Self-Hosted Runners](#-using-self-hosted-runners)
- [Overview](#overview-1)
- [Enabling Self-Hosted Runners](#enabling-self-hosted-runners)
- [Dagger example with Self-Hosted Runners](#dagger-example-with-self-hosted-runners)

## 🌟 Overview

Expand Down Expand Up @@ -125,21 +124,12 @@ The Vault creation is made in 2 steps:

## 🧪 CI

### 🚧 Transition to Dagger
### Overview

#### Overview
Our CI currently supports two ways of declaring tasks. We are in the process of transitioning to using [Dagger](https://dagger.io/) exclusively. Here's a breakdown of the current methods:

1. **[Task](https://taskfile.dev/installation/)**:
- Utilized for Terraform code quality, conformance, and security.
- Integrates with [pre-commit-terraform](https://github.com/antonbabenko/pre-commit-terraform) to ensure best practices and security standards are met.

2. **[Dagger](https://dagger.io/)**:
- Used for Kustomize and Kubernetes conformance.
- Employs `kubeconform` for Kubernetes configuration validation.
We leverage **[Dagger](https://dagger.io/)** for all our CI tasks. Here's what is currently run:

#### Goal
We aim to standardize our CI tasks using Dagger across all processes. This transition is currently a work in progress.
* Validation of Kubernetes and Kustomize manifests using `kubeconform`
* Validation of Terraform/Opentofu configurations using the [pre-commit-terraform](https://github.com/antonbabenko/pre-commit-terraform)

### 🏠 Using Self-Hosted Runners

Expand All @@ -152,4 +142,35 @@ This feature can be enabled within the `tooling` kustomization. By leveraging se
- **Access to Private Endpoints**: Directly interact with internal resources that are not publicly accessible.
- **Increased Security**: Run CI tasks within our secure internal environment.

For detailed information on setting up and using GitHub Self-Hosted Runners, please refer to this [documentation](https://docs.github.com/en/actions/hosting-your-own-runners).
For detailed information on setting up and using GitHub Self-Hosted Runners, please refer to this [documentation](https://docs.github.com/en/actions/hosting-your-own-runners).

#### Dagger example with Self-Hosted Runners

```yaml
name: Cache testing

on:
pull_request:
push:
branches: ["main"]

jobs:

test-cache:
name: Testing in-cluster cache
runs-on: dagger-gha-runner-scale-set
container:
image: smana/dagger-cli:v0.11.9
env:
_EXPERIMENTAL_DAGGER_RUNNER_HOST: "tcp://dagger-engine:8080"
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}

steps:
- name: Simulate a build with heavy packages
uses: dagger/dagger-for-github@v5
with:
version: "latest"
verb: call
module: github.com/shykes/daggerverse.git/wolfi@dfb1f91fa463b779021d65011f0060f7decda0ba
args: container --packages "python3,py3-pip,go,rust,clang"
```
16 changes: 8 additions & 8 deletions clusters/mycluster-0/observability.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ spec:
kind: HelmRelease
name: kube-prometheus-stack
namespace: observability
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: loki
namespace: observability
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: vector-agent
namespace: observability
# - apiVersion: helm.toolkit.fluxcd.io/v2
# kind: HelmRelease
# name: loki
# namespace: observability
# - apiVersion: helm.toolkit.fluxcd.io/v2
# kind: HelmRelease
# name: vector-agent
# namespace: observability
5 changes: 3 additions & 2 deletions observability/mycluster-0/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ kind: Kustomization

resources:
- ../base/kube-prometheus-stack
- ../base/loki
- ../base/vector-agent
# Enabling the logging stack only when neeeded
# - ../base/loki
# - ../base/vector-agent
2 changes: 1 addition & 1 deletion security/base/cert-manager/vault-clusterissuer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
auth:
appRole:
path: approle
roleId: c9800133-dada-d5dd-3968-8196f1edc921 # !! This value changes each time I recreate the whole platform
roleId: 028010a8-49d4-c1af-71ce-6a0dff557f22 # !! This value changes each time I recreate the whole platform
secretRef:
name: cert-manager-vault-approle
key: secretId
25 changes: 0 additions & 25 deletions taskfile.yaml

This file was deleted.

12 changes: 12 additions & 0 deletions terraform/eks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ tags = {
GithubRepo = "demo-cloud-native-ref"
GithubOrg = "Smana"
}


karpenter_limits = {
"default" = {
cpu = "20"
memory = "64Gi"
}
"io" = {
cpu = "20"
memory = "64Gi"
}
}
```

3. Apply with `tofu apply -var-file variables.tfvars`
Expand Down
9 changes: 9 additions & 0 deletions terraform/eks/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,12 @@ data "http" "gateway_api_crds" {
count = length(local.gateway_api_crds_urls)
url = local.gateway_api_crds_urls[count.index]
}

# Kubernetes manifests
data "kubectl_filename_list" "karpenter_default" {
pattern = "${path.module}/kubernetes-manifests/karpenter/default-*.yaml"
}

data "kubectl_filename_list" "karpenter_io" {
pattern = "${path.module}/kubernetes-manifests/karpenter/io-*.yaml"
}
84 changes: 20 additions & 64 deletions terraform/eks/karpenter.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,70 +19,26 @@ resource "aws_eks_pod_identity_association" "karpenter" {
role_arn = module.karpenter.iam_role_arn
}

resource "kubectl_manifest" "karpenter_nodepool" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: In
values: ["4", "8", "16", "32"]
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["2"]
# - key: "karpenter.k8s.aws/instance-local-nvme"
# operator: Gt
# values: ["150"]
limits:
cpu: 200
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
YAML

depends_on = [
helm_release.karpenter
]
}

resource "kubectl_manifest" "karpenter_ec2_nodeclass" {
yaml_body = <<-YAML
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: "AL2"
# instanceStorePolicy: "RAID0"
role: ${module.karpenter.node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${var.env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${module.eks.cluster_name}
tags:
karpenter.sh/discovery: ${module.eks.cluster_name}
YAML
resource "kubectl_manifest" "karpenter" {
for_each = {
for file_name in flatten([
data.kubectl_filename_list.karpenter_default.matches,
data.kubectl_filename_list.karpenter_io.matches
]) : file_name => file_name
}

yaml_body = templatefile(
each.key,
{
cluster_name = module.eks.cluster_name,
env = var.env,
karpenter_node_iam_role_name = module.karpenter.node_iam_role_name
default_nodepool_cpu_limits = var.karpenter_limits.default.cpu
default_nodepool_memory_limits = var.karpenter_limits.default.memory
io_nodepool_cpu_limits = var.karpenter_limits.io.cpu
io_nodepool_memory_limits = var.karpenter_limits.io.memory
}
)

depends_on = [
helm_release.karpenter
Expand Down
15 changes: 15 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/default-ec2nc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: "AL2"
role: ${karpenter_node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
tags:
karpenter.sh/discovery: ${cluster_name}
29 changes: 29 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/default-nodepool.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# Do not select big instance types in order to avoid blast radius
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values: ["26"]
- key: karpenter.k8s.aws/instance-memory
operator: Lt
values: ["50001"]
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
limits:
cpu: ${default_nodepool_cpu_limits}
memory: ${default_nodepool_memory_limits}
19 changes: 19 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/io-ec2nc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: io
spec:
amiFamily: "AL2"
instanceStorePolicy: "RAID0"
role: ${karpenter_node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
userData: |
#!/bin/bash
/usr/bin/setup-local-disks raid0
tags:
karpenter.sh/discovery: ${cluster_name}
Loading