- Google cloud platform account and project
gcloud
CLIterraform
CLI- To run this module assumes elevated permissions (Kubernetes Engine Admin) in your GCP account, specifically permissions to create VPC networks, GKE clusters, and Compute nodes. This will not work on accounts using the "free plan" as you cannot use GPU nodes until a billing account is attached and activated.
- You will need to enable both the Kubernetes API and the Compute Engine APIs enabled. Click the GKE tab in the GCP panel for your project and enable the GKE API, which will also enable the Compute engine API at the same time
- Ensure you have GPU Quota in your desired region/zone. You can [request](GPU Quota) if it is not enabled in a new account. You will need quota for both
GPUS_ALL_REGIONS
and for the specific GPU in the desired region.
- Copy
terraform.tfvars.example
toterraform.tfvars
and update the values as desired.
cp terraform.tfvars.example terraform.tfvars
- Login to GCP with
gcloud
cli:
gcloud auth application-default login
- Initialize terraform which downloads the required providers:
terraform init
- Plan the terraform deployment:
terraform plan
- If you're happy with the plan, apply the changes:
terraform apply
- After the cluster has been created, you can connect to the cluster with
kubectl
by running the following two commands after the cluster is created:
gcloud components install gke-gcloud-auth-plugin
gcloud container clusters get-credentials <CLUSTER_NAME> --region=<REGION>
Note: Steps 4-6 can be done automatically by running the setup.sh
script in this directory.
chmod +x setup.sh
./setup.sh
You can delete the cluster with the following command:
terraform destroy