This directory contains the root terraform module describing the ghaf CI setup in Azure.
For architectural description, see README-azure.md originally from PR#35
The setup uses Nix to build disk images, uploads them to Azure, and then boots virtual machines off of them.
Images are considered "appliance images", meant the Nix code describing their configuration describes the exact same purpose of the machine (no two-staged deployment process, the machine does the thing it's supposed to do after bootup), allowing to remove the need for e.g. ssh access as much as possible.
Machines are considered ephemeral, every change in the appliance image / nixos configuration causes a new image to be built, and a new VM to be booted with that new image.
This document assumes you have nix
package manager installed on you local host.
Clone this repository:
$ git clone https://github.com/tiiuae/ghaf-infra.git
$ cd ghaf-infra
Bootstrap nix-shell with the required dependencies:
# Start a nix-shell with required dependencies:
$ nix-shell
# Authenticate with az login:
$ az login
# Terraform comands are executed under the terraform directory:
$ cd terraform/
All commands in this document are executed from nix-shell inside the terraform
directory.
terraform
├── azarm
├── persistent
│ ├── binary-cache-sigkey
│ ├── binary-cache-storage
│ ├── builder-ssh-key
│ └── workspace-specific
├── state-storage
│ └── tfstate-storage.tf
├── modules
│ ├── azurerm-linux-vm
│ └── azurerm-nix-vm-image
├── binary-cache.tf
├── builder.tf
├── jenkins-controller.tf
└── main.tf
- The
terraform
directory contains the root terraform deployment files with the VM configurationsbinary-cache.tf
,builder.tf
, andjenkins-controller.tf
matching the components described in README-azure.md in its components section. - The
terraform/azarm
directory contains the terraform configuration for Azureaarch64
builder which is used from ghaf github-actions build.yml workflow to buildaarch64
targets for authorized PRs pre-merge.azarm
is disconnected from the root terraform module: it's a separate configuration with its own state. - The
terraform/persistent
directory contains the terraform configuration for parts of the infrastructure that are considered persistent - resources defined underterraform/persistent
will not be removed even if the ghaf-infra instance is otherwise removed. An example of such persistent ghaf-infra resource is the binary cache storage as well as the binary cache signing key. There may be many 'persistent' infrastructure instances - currentlydev
andprod
deployments have their own instances of the persistent resources. Section Multiple Environments with Terraform Workspaces discusses this topic with more details. - The
terraform/state-storage
directory contains the terraform configuration for the ghaf-infra remote backend state storage using Azure storage blob. See section Initializing Azure State and Persistent Data for more details. - The
terraform/modules
directory contains terraform modules used from the ghaf-infra VM configurations to build, upload, and spin up Azure nix images.
This project stores the terraform state in a remote storage in an azure storage blob as configured in tfstate-storage.tf. The benefits of using such remote storage setup are well outlined in storing state in azure storage and terraform backend configuration.
To initialize the backend storage, use the terraform-init-sh
:
# Inside the terraform directory
$ ./terraform-init.sh
[+] Initializing state storage
[+] Initializing persistent data
...
[+] Running terraform init
terraform-init.sh
will not do anything if the initialization has already been done. In other words, it's safe to run the script many times; it will not destroy or re-initialize anything if the init was already executed.
In addition to the shared terraform state, some of the infrastructure resources are also shared between the ghaf-infra instances. terraform-init.sh
initializes the persistent configuration defined under terraform/persistent
. There may be many 'persistent' infrastructure instances: currently dev
and prod
deployments have their own instances of the persistent resources. Section Multiple Environments with Terraform Workspaces discusses this topic with more details.
To support infrastructure development in isolated environments, this project uses terraform workspaces. The main reasons for using terraform workspaces include:
- Different workspaces allow deploying different instances of ghaf-infra. Each instance has a completely separate state data, making it possible to deploy
dev
,prod
, or even private development instances of ghaf-infra. This makes it possible to first develop and test infrastructure changes in a private development environment, before proposing changes to shared (e.g.dev
orprod
) environments. The configuration codebase is the same between all the environments, with the differentiation options defined in themain.tf
. - Parts of the ghaf-infra infrastructure are persistent and shared between different environments. As an example, private
dev
environments share the binary cache storage. This arrangement makes it possible to treat, for instance,dev
and private ghaf-infra instances dispensable: ghaf-infra instances can be temporary and short-lived as it's easy to spin-up new environments without losing any valuable data. The persistent data is configured outside the root ghaf-infra terraform deployment in theterraform/persistent
directory. There may be many 'persistent' infrastructure instances - currentlydev
andprod
deployments have their own instances of the persistent resources. This means thatdev
andprod
instances of ghaf-infra do not share any persistent data. As an example,dev
andprod
deployments of ghaf-infra have a separate binary cache storage. The binding to persistent resources from ghaf-infra is done in themain.tf
based on the terraform workspace name and resource location. Persistent data initialization is automatically done withterraform-init.sh
script. - Currently, the following resources are defined 'persistent', meaning
dev
andprod
instances do not share the following resources:- Binary cache storage:
binary-cache-storage.tf
- Binary cache signing key:
binary-cache-sigkey.ft
- Builder ssh key:
builder-ssh-key.tf
- Binary cache storage:
To help facilitate the usage of terraform workspaces in setting-up distinct copies of ghaf-infra, one can use terraform workspaces from the command line or consider using the helper script provided at terraform-playground.sh
. Below, for the sake of example, we use the terraform-playground.sh
to setup a private deployment instance of ghaf-infra:
# Activate private development environment
$ ./terraform-playground.sh activate
# ...
[+] Done, use terraform [validate|plan|apply] to work with your dev infra
Which sets-up a terraform workspace for your private development environment:
# List the current terraform worskapce
$ terraform workspace list
Terraform workspaces:
default
dev
* henrirosten # <-- indicates active workspace
prod
Following describes the intended workflow, with commands executed from the nix-shell.
Once your are ready to deploy your terraform or nix configuration changes, the following sequence of commands typically take place:
# Inside the terraform directory
# Format the terraform code files:
$ terraform fmt -recursive
# Validate the terraform changes:
$ terraform validate
# Make sure you deploy to the correct ghaf-infra instance.
# Use terraform workspace select <workspace_name> to switch workspaces
$ terraform workspace list
default
dev
* henrirosten # <== This example deploys to private dev environment
prod
# Show what actions terraform would take on apply:
$ terraform plan
# Apply your configuration changes:
$ terraform apply
Once terraform apply
completes, the private development infrastructure is deployed.
You can now play around in your isolated copy of the infrastructure, testing and updating the changes, making sure the changes work as expected before merging the changes.
Once the configuration changes have been tested, the private development environment can be destroyed:
# Destroy the private terraform worskapce using helper script
$ ./terraform-playground.sh destroy
# Alternatively, you can use terraform command directly
$ terraform workspace select <workspace_name>
$ terraform apply -destroy
The above command(s) remove all the resources that were created for the given environment.
By default, ghaf-infra is deployed to Azure location northeurope
(North Europe).
However, ghaf-infra resources can be deployed to other Azure locations too, with the following caveats:
- Ghaf-infra has been tested in a limited set of locations.
terraform-init.sh
exits with an error if you try to initialize ghaf-infra in a non-supported (non-tested) location. When deploying to a new, previously unsupported location, you need to modify theterraform-init.sh
. - For a full list of available Azure location names, run
az account list-locations -o table
in ghaf-infra devshell. - Not all Azure VM sizes or other resources are available in all Azure locations. You can search the availability of specific resources through the Azure region product page e.g.: https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?regions=europe-north&products=virtual-machines. Alternatively, you can list the VM sizes per location with
az vm list-sizes
command from the ghaf-infra devshell, for instance:az vm list-sizes --location 'northeurope' -o table
. - Your Azure subscription quota limits impact the ability to deploy ghaf-infra, as such, you might need to increase the vCPU quotas for your subscription via the Azure web portal. See more information at https://learn.microsoft.com/en-us/azure/quotas/quotas-overview. You can check your quota usage from the Azure web portal or using
az vm list-usage
, for instance:az vm list-usage --location "northeurope" -o table
.
Following shows an example of deploying ghaf-infra to Azure location SWE Central:
# Initialize terraform state and persistent data, using SWE Central as an example location:
$ ./terraform-init.sh -l swedencentral
# Switch to (and optionally create) a workspace 'devswec'
$ terraform workspace new devswec || terraform workspace select devswec
# Optionally, run Terraform plan:
# (Variable 'envtype' overrides the default environment type)
$ terraform plan -var="envtype=dev"
# Deploy with Terraform apply:
$ terraform apply -var="envtype=dev" -auto-approve
Below are some common Terraform errors with tips on how to resolve each.
$ terraform apply
...
azurerm_virtual_machine_extension.deploy_ubuntu_builder: Creating...
╷
│ Error: A resource with the ID "/subscriptions/<SUBID>/resourceGroups/rg-name-here/providers/Microsoft.Compute/virtualMachines/azarm/extensions/azarm-vmext" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_virtual_machine_extension" for more information.
Example fix:
$ terraform import azurerm_virtual_machine_extension.deploy_ubuntu_builder /subscriptions/<SUBID>/resourceGroups/rg-name-here/providers/Microsoft.Compute/virtualMachines/azarm/extensions/azarm-vmext
# Ref: https://stackoverflow.com/questions/61418168/terraform-resource-with-the-id-already-exists
$ terraform apply
...
│ Error: creating/updating Image (Subscription: "<SUBID>"
│ Resource Group Name: "ghaf-infra-dev"
│ Image Name: "<NAME>"): performing CreateOrUpdate: unexpected status 400 with error: InvalidParameter: The source blob https://<INSTANCE>.blob.core.windows.net/ghaf-infra-vm-images/<IMANE>.vhd is not accessible.
│
│ with module.builder_image.azurerm_image.default,
│ on modules/azurerm-nix-vm-image/main.tf line 22, in resource "azurerm_image" "default":
│ 22: resource "azurerm_image" "default" {
Try running terraform apply
again if you get an error similar to one shown above.
It's unclear why this error occasionally occurs, this issue should be analyzed in detail.
$ terraform apply
...
│ Error: Disk (Subscription: "<SUBID>"
│ Resource Group Name: "ghaf-infra-persistent-eun"
│ Disk Name: "binary-cache-vm-caddy-state-dev") was not found
│
│ with data.azurerm_managed_disk.binary_cache_caddy_state,
│ on main.tf line 207, in data "azurerm_managed_disk" "binary_cache_caddy_state":
│ 207: data "azurerm_managed_disk" "binary_cache_caddy_state" {
Above error (or similar) is likely caused by missing initialization for some persistent
resources.
Fix the persistent initialization by running terraform-init.sh
then run terraform apply
again.