Skip to content

Commit

Permalink
Merge pull request #311 from Smana/feat_incluster_dagger_cache
Browse files Browse the repository at this point in the history
feat(ci): configure in cluster dagger engine
  • Loading branch information
Smana authored Jul 2, 2024
2 parents 77c8712 + 680a989 commit 2b7ec80
Show file tree
Hide file tree
Showing 30 changed files with 449 additions and 142 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Issue Report
about: Create a report to help us improve
title: '[ISSUE] Brief Description of Issue'
title: 'Brief Description of Issue'
labels: bug
assignees: ''

Expand Down
6 changes: 2 additions & 4 deletions .github/ISSUE_TEMPLATE/enhancement.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Enhancement Suggestion
about: Suggest an idea for this project
title: '[ENHANCEMENT] Brief Description of Enhancement'
title: 'Brief Description of Enhancement'
labels: enhancement
assignees: ''

Expand All @@ -15,13 +15,11 @@ assignees: ''

### Motivation
*Explain why this enhancement would be useful to the project or users.*
*Describe the potential benefits of the enhancement, including possible impacts on performance, usability, and efficiency.*

### Detailed Explanation
*Provide a detailed explanation of the proposed enhancement. Include any preliminary ideas you have about the implementation, and how it integrates with existing functionalities.*

### Benefits
*Describe the potential benefits of the enhancement, including possible impacts on performance, usability, and efficiency.*

### Possible Drawbacks
*Consider any possible drawbacks or issues that might arise with the implementation of this enhancement.*

Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,21 @@ jobs:
with:
fetch-depth: 0

- name: Install Task
uses: arduino/setup-task@v1
with:
version: 3.x
repo-token: ${{ secrets.GITHUB_TOKEN }}

- name: Write required vault files
run: |
mkdir -p terraform/vault/cluster/.tls
echo 'keep' > terraform/vault/cluster/.tls/vault.pem
echo 'keep' > terraform/vault/cluster/.tls/vault-key.pem
echo 'keep' > terraform/vault/cluster/.tls/ca-chain.pem
- name: pre-commit checks
run: task pre-commit
- name: Validate Terraform Opentofu configuration
uses: dagger/dagger-for-github@v5
with:
version: "latest"
verb: call
module: github.com/Smana/daggerverse/pre-commit-tf@pre-commit-tf/v0.0.1
args: run --dir "." --tf-binary="tofu"
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}

kubernetes-validation:
name: Kubernetes validation ☸
Expand Down
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,11 @@ This repository provides a comprehensive guide and set of tools for building, ma
- [🔗 VPN connection using Tailscale](#-vpn-connection-using-tailscale)
- [🔑 Private PKI with Vault](#-private-pki-with-vault)
- [🧪 CI](#-ci)
- [🚧 Transition to Dagger](#-transition-to-dagger)
- [Overview](#overview)
- [Goal](#goal)
- [Overview](#overview)
- [🏠 Using Self-Hosted Runners](#-using-self-hosted-runners)
- [Overview](#overview-1)
- [Enabling Self-Hosted Runners](#enabling-self-hosted-runners)
- [Dagger example with Self-Hosted Runners](#dagger-example-with-self-hosted-runners)

## 🌟 Overview

Expand Down Expand Up @@ -125,21 +124,12 @@ The Vault creation is made in 2 steps:

## 🧪 CI

### 🚧 Transition to Dagger
### Overview

#### Overview
Our CI currently supports two ways of declaring tasks. We are in the process of transitioning to using [Dagger](https://dagger.io/) exclusively. Here's a breakdown of the current methods:

1. **[Task](https://taskfile.dev/installation/)**:
- Utilized for Terraform code quality, conformance, and security.
- Integrates with [pre-commit-terraform](https://github.com/antonbabenko/pre-commit-terraform) to ensure best practices and security standards are met.

2. **[Dagger](https://dagger.io/)**:
- Used for Kustomize and Kubernetes conformance.
- Employs `kubeconform` for Kubernetes configuration validation.
We leverage **[Dagger](https://dagger.io/)** for all our CI tasks. Here's what is currently run:

#### Goal
We aim to standardize our CI tasks using Dagger across all processes. This transition is currently a work in progress.
* Validation of Kubernetes and Kustomize manifests using `kubeconform`
* Validation of Terraform/Opentofu configurations using the [pre-commit-terraform](https://github.com/antonbabenko/pre-commit-terraform)

### 🏠 Using Self-Hosted Runners

Expand All @@ -152,4 +142,35 @@ This feature can be enabled within the `tooling` kustomization. By leveraging se
- **Access to Private Endpoints**: Directly interact with internal resources that are not publicly accessible.
- **Increased Security**: Run CI tasks within our secure internal environment.

For detailed information on setting up and using GitHub Self-Hosted Runners, please refer to this [documentation](https://docs.github.com/en/actions/hosting-your-own-runners).
For detailed information on setting up and using GitHub Self-Hosted Runners, please refer to this [documentation](https://docs.github.com/en/actions/hosting-your-own-runners).

#### Dagger example with Self-Hosted Runners

```yaml
name: Cache testing

on:
pull_request:
push:
branches: ["main"]

jobs:

test-cache:
name: Testing in-cluster cache
runs-on: dagger-gha-runner-scale-set
container:
image: smana/dagger-cli:v0.11.9
env:
_EXPERIMENTAL_DAGGER_RUNNER_HOST: "tcp://dagger-engine:8080"
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}

steps:
- name: Simulate a build with heavy packages
uses: dagger/dagger-for-github@v5
with:
version: "latest"
verb: call
module: github.com/shykes/daggerverse.git/wolfi@dfb1f91fa463b779021d65011f0060f7decda0ba
args: container --packages "python3,py3-pip,go,rust,clang"
```
16 changes: 8 additions & 8 deletions clusters/mycluster-0/observability.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ spec:
kind: HelmRelease
name: kube-prometheus-stack
namespace: observability
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: loki
namespace: observability
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: vector-agent
namespace: observability
# - apiVersion: helm.toolkit.fluxcd.io/v2
# kind: HelmRelease
# name: loki
# namespace: observability
# - apiVersion: helm.toolkit.fluxcd.io/v2
# kind: HelmRelease
# name: vector-agent
# namespace: observability
5 changes: 3 additions & 2 deletions observability/mycluster-0/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ kind: Kustomization

resources:
- ../base/kube-prometheus-stack
- ../base/loki
- ../base/vector-agent
# Enabling the logging stack only when neeeded
# - ../base/loki
# - ../base/vector-agent
2 changes: 1 addition & 1 deletion security/base/cert-manager/vault-clusterissuer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
auth:
appRole:
path: approle
roleId: c9800133-dada-d5dd-3968-8196f1edc921 # !! This value changes each time I recreate the whole platform
roleId: 028010a8-49d4-c1af-71ce-6a0dff557f22 # !! This value changes each time I recreate the whole platform
secretRef:
name: cert-manager-vault-approle
key: secretId
25 changes: 0 additions & 25 deletions taskfile.yaml

This file was deleted.

12 changes: 12 additions & 0 deletions terraform/eks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ tags = {
GithubRepo = "demo-cloud-native-ref"
GithubOrg = "Smana"
}
karpenter_limits = {
"default" = {
cpu = "20"
memory = "64Gi"
}
"io" = {
cpu = "20"
memory = "64Gi"
}
}
```

3. Apply with `tofu apply -var-file variables.tfvars`
Expand Down
9 changes: 9 additions & 0 deletions terraform/eks/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,12 @@ data "http" "gateway_api_crds" {
count = length(local.gateway_api_crds_urls)
url = local.gateway_api_crds_urls[count.index]
}

# Kubernetes manifests
data "kubectl_filename_list" "karpenter_default" {
pattern = "${path.module}/kubernetes-manifests/karpenter/default-*.yaml"
}

data "kubectl_filename_list" "karpenter_io" {
pattern = "${path.module}/kubernetes-manifests/karpenter/io-*.yaml"
}
84 changes: 20 additions & 64 deletions terraform/eks/karpenter.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,70 +19,26 @@ resource "aws_eks_pod_identity_association" "karpenter" {
role_arn = module.karpenter.iam_role_arn
}

resource "kubectl_manifest" "karpenter_nodepool" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: In
values: ["4", "8", "16", "32"]
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["2"]
# - key: "karpenter.k8s.aws/instance-local-nvme"
# operator: Gt
# values: ["150"]
limits:
cpu: 200
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
YAML

depends_on = [
helm_release.karpenter
]
}

resource "kubectl_manifest" "karpenter_ec2_nodeclass" {
yaml_body = <<-YAML
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: "AL2"
# instanceStorePolicy: "RAID0"
role: ${module.karpenter.node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${var.env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${module.eks.cluster_name}
tags:
karpenter.sh/discovery: ${module.eks.cluster_name}
YAML
resource "kubectl_manifest" "karpenter" {
for_each = {
for file_name in flatten([
data.kubectl_filename_list.karpenter_default.matches,
data.kubectl_filename_list.karpenter_io.matches
]) : file_name => file_name
}

yaml_body = templatefile(
each.key,
{
cluster_name = module.eks.cluster_name,
env = var.env,
karpenter_node_iam_role_name = module.karpenter.node_iam_role_name
default_nodepool_cpu_limits = var.karpenter_limits.default.cpu
default_nodepool_memory_limits = var.karpenter_limits.default.memory
io_nodepool_cpu_limits = var.karpenter_limits.io.cpu
io_nodepool_memory_limits = var.karpenter_limits.io.memory
}
)

depends_on = [
helm_release.karpenter
Expand Down
15 changes: 15 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/default-ec2nc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: "AL2"
role: ${karpenter_node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
tags:
karpenter.sh/discovery: ${cluster_name}
29 changes: 29 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/default-nodepool.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# Do not select big instance types in order to avoid blast radius
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values: ["26"]
- key: karpenter.k8s.aws/instance-memory
operator: Lt
values: ["50001"]
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
limits:
cpu: ${default_nodepool_cpu_limits}
memory: ${default_nodepool_memory_limits}
19 changes: 19 additions & 0 deletions terraform/eks/kubernetes-manifests/karpenter/io-ec2nc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: io
spec:
amiFamily: "AL2"
instanceStorePolicy: "RAID0"
role: ${karpenter_node_iam_role_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${env}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
userData: |
#!/bin/bash
/usr/bin/setup-local-disks raid0
tags:
karpenter.sh/discovery: ${cluster_name}
Loading

0 comments on commit 2b7ec80

Please sign in to comment.