Website: https://www.cast.ai
- Terraform 0.13+
A module to create Azure role and a service principal that can be used to connect to CAST AI
Requires castai/castai
, hashicorp/azurerm
, hashicorp/azuread
, hashicorp/helm
providers to be configured.
The required parameters can be provided manually or alternatively can be easily acquired from your AKS cluster resource or Azure RM subscription data source.
module "castai-aks-cluster" {
source = "castai/aks/castai"
aks_cluster_name = var.aks_cluster_name
aks_cluster_region = var.aks_cluster_region
node_resource_group = azurerm_kubernetes_cluster.example.node_resource_group
resource_group = azurerm_kubernetes_cluster.example.resource_group_name
delete_nodes_on_disconnect = true
subscription_id = data.azurerm_subscription.current.subscription_id
tenant_id = data.azurerm_subscription.current.tenant_id
default_node_configuration = module.castai-aks-cluster.castai_node_configurations["default"]
node_configurations = {
default = {
disk_cpu_ratio = 25
subnets = [azurerm_subnet.internal.id]
tags = {
"node-config" : "default"
}
}
}
node_templates = {
spot_tmpl = {
configuration_id = module.castai-aks-cluster.castai_node_configurations["default"]
should_taint = true
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
custom-label-key-2 = "custom-label-value-2"
}
custom_taints = [
{
key = "custom-taint-key-1"
value = "custom-taint-value-1"
},
{
key = "custom-taint-key-2"
value = "custom-taint-value-2"
}
]
constraints = {
fallback_restore_rate_seconds = 1800
spot = true
use_spot_fallbacks = true
min_cpu = 4
max_cpu = 100
instance_families = {
exclude = ["standard_DPLSv5"]
}
compute_optimized_state = "disabled"
storage_optimized_state = "disabled"
}
}
}
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
headroom = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
headroom_spot = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5s10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Version 3.x.x changes:
- Removed
custom_label
attribute incastai_node_template
resource. Usecustom_labels
instead.
Old configuration:
module "castai-aks-cluster" {
node_templates = {
spot_tmpl = {
custom_label = {
key = "custom-label-key-1"
value = "custom-label-value-1"
}
}
}
}
New configuration:
module "castai-aks-cluster" {
node_templates = {
spot_tmpl = {
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
}
}
}
}
Version 4.x.x changed:
- Removed
compute_optimized
andstorage_optimized
attributes incastai_node_template
resource,constraints
object. Usecompute_optimized_state
andstorage_optimized_state
instead.
Old configuration:
module "castai-aks-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized = false
storage_optimized = true
}
}
}
}
New configuration:
module "castai-aks-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized_state = "disabled"
storage_optimized_state = "enabled"
}
}
}
}
Version 5.2.x changed:
- Deprecated
autoscaler_policies_json
attribute. Useautoscaler_settings
instead.
Old configuration:
module "castai-aks-cluster" {
autoscaler_policies_json = <<-EOT
{
"enabled": true,
"unschedulablePods": {
"enabled": true
},
"nodeDownscaler": {
"enabled": true,
"emptyNodes": {
"enabled": true
},
"evictor": {
"aggressiveMode": false,
"cycleInterval": "5m10s",
"dryRun": false,
"enabled": true,
"nodeGracePeriodMinutes": 10,
"scopedMode": false
}
},
"nodeTemplatesPartialMatchingEnabled": false,
"clusterLimits": {
"cpu": {
"maxCores": 20,
"minCores": 1
},
"enabled": true
}
}
EOT
}
New configuration:
module "castai-aks-cluster" {
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5m10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Usage examples are located in terraform provider repo
Name | Version |
---|---|
terraform | >= 0.13 |
azuread | >= 2.22.0 |
azurerm | >= 3.7.0 |
castai | ~> 7.4.0 |
helm | >= 2.0.0 |
Name | Version |
---|---|
azuread | >= 2.22.0 |
azurerm | >= 3.7.0 |
castai | ~> 7.4.0 |
helm | >= 2.0.0 |
null | n/a |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
additional_resource_groups | n/a | list(string) |
[] |
no |
agent_values | List of YAML formatted string values for agent helm chart | list(string) |
[] |
no |
agent_version | Version of castai-agent helm chart. If not provided, latest version will be used. | string |
null |
no |
aks_cluster_name | Name of the cluster to be connected to CAST AI. | string |
n/a | yes |
aks_cluster_region | Region of the AKS cluster | string |
n/a | yes |
api_grpc_addr | CAST AI GRPC API address | string |
"api-grpc.cast.ai:443" |
no |
api_url | URL of alternative CAST AI API to be used during development or testing | string |
"https://api.cast.ai" |
no |
autoscaler_policies_json | Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use autoscaler_settings instead. |
string |
null |
no |
autoscaler_policy_overrides | Optional Autoscaler policy definitions to override current autoscaler settings | any |
null |
no |
castai_api_token | Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when wait_for_cluster_ready is set to true |
string |
"" |
no |
castai_components_labels | Optional additional Kubernetes labels for CAST AI pods | map(any) |
{} |
no |
castai_components_sets | Optional additional 'set' configurations for helm resources. | map(string) |
{} |
no |
cluster_controller_values | List of YAML formatted string values for cluster-controller helm chart | list(string) |
[] |
no |
cluster_controller_version | Version of castai-cluster-controller helm chart. If not provided, latest version will be used. | string |
null |
no |
default_node_configuration | ID of the default node configuration | string |
n/a | yes |
delete_nodes_on_disconnect | Optionally delete Cast AI created nodes when the cluster is destroyed | bool |
false |
no |
evictor_ext_values | List of YAML formatted string with evictor-ext values | list(string) |
[] |
no |
evictor_ext_version | Version of castai-evictor-ext chart. Default latest | string |
null |
no |
evictor_values | List of YAML formatted string values for evictor helm chart | list(string) |
[] |
no |
evictor_version | Version of castai-evictor chart. If not provided, latest version will be used. | string |
null |
no |
grpc_url | gRPC endpoint used by pod-pinner | string |
"grpc.cast.ai:443" |
no |
install_security_agent | Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/) | bool |
false |
no |
kvisor_values | List of YAML formatted string values for kvisor helm chart | list(string) |
[] |
no |
kvisor_version | Version of kvisor chart. If not provided, latest version will be used. | string |
null |
no |
kvisor_controller_extra_args | Map of extra arguments for the kvisor controller | map(string) |
{ kube-linter-enabled = true image-scan-enabled = true kube-bench-enabled = true kube-bench-cloud-provider = eks } |
no |
node_configurations | Map of AKS node configurations to create | any |
{} |
no |
node_resource_group | n/a | string |
n/a | yes |
node_templates | Map of node templates to create | any |
{} |
no |
pod_pinner_version | Version of pod-pinner helm chart. Default latest | string |
null |
no |
resource_group | n/a | string |
n/a | yes |
self_managed | Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system. | bool |
false |
no |
spot_handler_values | List of YAML formatted string values for spot-handler helm chart | list(string) |
[] |
no |
spot_handler_version | Version of castai-spot-handler helm chart. If not provided, latest version will be used. | string |
null |
no |
subscription_id | Azure subscription ID | string |
n/a | yes |
tenant_id | n/a | string |
n/a | yes |
wait_for_cluster_ready | Wait for cluster to be ready before finishing the module execution, this option requires castai_api_token to be set |
bool |
false |
no |
Name | Description |
---|---|
castai_node_configurations | Map of node configurations ids by name |
castai_node_templates | Map of node template by name |
cluster_id | CAST.AI cluster id, which can be used for accessing cluster data using API |