Internet2 · kraigeisenman · Apr 14, 2023 · Apr 11, 2023 · Apr 11, 2023 · Apr 11, 2023
diff --git a/README.md b/README.md
@@ -2,7 +2,13 @@
 
 <h2>Overview</h2>
 
-This project will deploy a Jupyterhub with scalable compute nodes (for distributed computing) on 3 cloud platforms using AWS EKS, Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). The goals are to (1) Create a replicable, reusable template for deployment of Jupyterhubs on campuses using Terraform and/or other automation scripts (2) Create documentation around best practices for deploying Jupyterhubs including steps on SSO/OAuth, cost optimization, and other node scaling mechanisms. 
+This project will deploy a Jupyterhub with scalable compute nodes (for distributed computing) on 3 cloud platforms using AWS EKS, Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). There are several learning objectives for this project:
+
+1. Learn how create replicable and reusable templates for deployment of cloud services using Infrastructure-as-Code (IaC), namely Terraform
+2. Understand how to use Kubernetes for container orchestration and scaling of microservices
+3. Understand how to deploy scalable Jupyterhubs (i.e. Jupyterhubs as a service, Jupyterhub for Classroom)
+4. Understand best practices on Jupyterhub deployments including steps on SSO/OAuth, cost optimization, security, networking, and other node scaling mechanisms
+
 
 <h2>Contents</h2>
 

diff --git a/bind-jhub-fqdn2IP.sh b/bind-jhub-fqdn2IP.sh
@@ -0,0 +1,39 @@
+#! /bin/bash
+############################################
+# This script will help bind the fqdn
+# to the Jupyterhub static IP address.
+# This usefull on Azure. You probably have
+# similar process on other cloud provide
+#
+# Original script from here
+# https://docs.microsoft.com/en-us/azure/aks/ingress-tls
+# Synopsis:
+# ./bind-jhub-fqdn2IP.sh <IP> <NAME>
+# CC 2023-04-11 Jacob Fosso Tande
+#########################################
+# configure an FQDN for the ingress controller IP address
+# Public IP address of your ingress controller
+IPADDRESS=$1
+NAME=$2
+IP="$IPADDRESS"
+
+# Name to associate with public IP address
+DNSNAME="$NAME"
+
+# Get the resource-id of the public ip
+PUBLICIPID=$(az network public-ip list --query "[?ipAddress!=null]|[?contains(ipAddress, '$IP')].[id]" --output tsv)
+
+# Update public ip address with DNS name
+az network public-ip update --ids $PUBLICIPID --dns-name $DNSNAME
+
+# Display the FQDN
+FQDN=$(az network public-ip show --ids $PUBLICIPID --query "[dnsSettings.fqdn]" --output tsv)
+
+echo "  "
+echo "  " 
+echo " Got FQDN "
+echo "  "
+echo " "
+echo $FQDN
+echo "  "
+echo "  "
diff --git a/caseyd/README.md b/caseyd/README.md
@@ -3,32 +3,44 @@
 Working directory for Casey Dinsmore
 
 
-# Jupyterhub Common Commands
+# Common Commands
 
 ## Helm
 
 ### Install / Reconfigure
-
+```
   helm upgrade --cleanup-on-fail \
     --install jhub jupyterhub/jupyterhub \
     --namespace jhub \
     --create-namespace \
     --version=2.0.0 \
     --values config.yaml
-
+```
 ## Kubectl
 
 ### Get Proxy Address
-
-kubectl -n jhub get service proxy-public
+```
+ kubectl -n jhub get service proxy-public
+```
 
 ### Show all pod states
+```
  kubectl get pods -A
+```
 
 ### View details about a pod including deployment errors
-
+```
  kubectl -n jhub describe pod <pod.name>
-
+```
 ### Get the logs for a pod
-
+```
  kubectl -n jhub get logs <pod.name>
+ ```
+
+### Get Persistent Volumes/Claims
+```
+ kubectl -n jhub get pv
+ ```
+ ```
+ kubectl -n jhub get pvc
+ ```
diff --git a/caseyd/aws/eksctl/README.md b/caseyd/aws/eksctl/README.md
@@ -0,0 +1,25 @@
+# AWS EKS cluster config with eksctl
+
+## Resources
+
+https://www.arhea.net/posts/2020-06-18-jupyterhub-amazon-eks
+
+
+## Issues
+
+* With 4 availability zones in us-west-2, eksctl will randomly pick three and
+so sometimes the deployment will fail.
+
+Adding the AvailibilityZones: stanza to cluster.yaml resolves the issue as outlined here:
+
+https://github.com/weaveworks/eksctl/blob/main/examples/05-advanced-nodegroups.yaml
+
+ availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2d"]
+
+* Hubs end up stuck in the Pending state
+
+ running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
+
+Resolution here does not seem to work
+
+https://discourse.jupyter.org/t/hub-pod-stuck-on-pending-timed-out-binding-volumes/17176
diff --git a/caseyd/aws/eksctl/cluster.yaml b/caseyd/aws/eksctl/cluster.yaml
@@ -0,0 +1,113 @@
+# file: cluster.yml
+apiVersion: eksctl.io/v1alpha5
+kind: ClusterConfig
+
+metadata:
+  name: jupyterhub
+  region: us-west-2
+
+iam:
+  withOIDC: true
+  serviceAccounts:
+    - metadata:
+        name: cluster-autoscaler
+        namespace: kube-system
+        labels:
+            aws-usage: "cluster-ops"
+            app.kubernetes.io/name: cluster-autoscaler
+      attachPolicy:
+        Version: "2012-10-17"
+        Statement:
+          - Effect: Allow
+            Action:
+              - "autoscaling:DescribeAutoScalingGroups"
+              - "autoscaling:DescribeAutoScalingInstances"
+              - "autoscaling:DescribeLaunchConfigurations"
+              - "autoscaling:DescribeTags"
+              - "autoscaling:SetDesiredCapacity"
+              - "autoscaling:TerminateInstanceInAutoScalingGroup"
+              - "ec2:DescribeLaunchTemplateVersions"
+            Resource: '*'
+    - metadata:
+        name: ebs-csi-controller-sa
+        namespace: kube-system
+        labels:
+            aws-usage: "cluster-ops"
+            app.kubernetes.io/name: aws-ebs-csi-driver
+      attachPolicy:
+        Version: "2012-10-17"
+        Statement:
+          - Effect: Allow
+            Action:
+              - "ec2:AttachVolume"
+              - "ec2:CreateSnapshot"
+              - "ec2:CreateTags"
+              - "ec2:CreateVolume"
+              - "ec2:DeleteSnapshot"
+              - "ec2:DeleteTags"
+              - "ec2:DeleteVolume"
+              - "ec2:DescribeInstances"
+              - "ec2:DescribeSnapshots"
+              - "ec2:DescribeTags"
+              - "ec2:DescribeVolumes"
+              - "ec2:DetachVolume"
+            Resource: '*'
+
+managedNodeGroups:
+  - name: ng-us-west-2a
+    instanceType: t3.medium
+    volumeSize: 30
+    desiredCapacity: 1
+    privateNetworking: true
+    availabilityZones:
+      - us-west-2a
+    tags:
+      k8s.io/cluster-autoscaler/enabled: "true"
+      k8s.io/cluster-autoscaler/jupyterhub: "owned"
+  - name: ng-us-west-2b
+    instanceType: t3.medium
+    volumeSize: 30
+    desiredCapacity: 1
+    privateNetworking: true
+    availabilityZones:
+      - us-west-2b
+    tags:
+      k8s.io/cluster-autoscaler/enabled: "true"
+      k8s.io/cluster-autoscaler/jupyterhub: "owned"
+  - name: ng-us-west-2c
+    instanceType: t3.medium
+    volumeSize: 30
+    desiredCapacity: 1
+    privateNetworking: true
+    availabilityZones:
+      - us-west-2d
+    tags:
+      k8s.io/cluster-autoscaler/enabled: "true"
+      k8s.io/cluster-autoscaler/jupyterhub: "owned"
+
+availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2d"]
+
+# Adding EBS CSI to try to resolve permissions
+# Does not seem to work
+# 2023/04/11
+# https://discourse.jupyter.org/t/hub-pod-stuck-on-pending-timed-out-binding-volumes/17176
+addons:
+  - name: aws-ebs-csi-driver
+    attachPolicy:
+      Version: "2012-10-17"
+      Statement:
+      - Effect: Allow
+        Action:
+        - "ec2:AttachVolume"
+        - "ec2:CreateSnapshot"
+        - "ec2:CreateTags"
+        - "ec2:CreateVolume"
+        - "ec2:DeleteSnapshot"
+        - "ec2:DeleteTags"
+        - "ec2:DeleteVolume"
+        - "ec2:DescribeInstances"
+        - "ec2:DescribeSnapshots"
+        - "ec2:DescribeTags"
+        - "ec2:DescribeVolumes"
+        - "ec2:DetachVolume"
+        Resource: '*'
diff --git a/caseyd/aws/jup-default.yaml b/caseyd/aws/jup-default.yaml
@@ -0,0 +1,12 @@
+# This file can update the JupyterHub Helm chart's default configuration values.
+#
+# For reference see the configuration reference and default values, but make
+# sure to refer to the Helm chart version of interest to you!
+#
+# Introduction to YAML:     https://www.youtube.com/watch?v=cdLNKUoMc6c
+# Chart config reference:   https://zero-to-jupyterhub.readthedocs.io/en/stable/resources/reference.html
+# Chart default values:     https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/HEAD/jupyterhub/values.yaml
+# Available chart versions: https://jupyterhub.github.io/helm-chart/
+#
+
+
diff --git a/caseyd/aws/tf/README.md b/caseyd/aws/tf/README.md
@@ -0,0 +1,62 @@
+
+## Terraform Commands
+
+### Install TF Requirements
+```
+ terraform init
+```
+
+### Validate Terraform file syntax
+```
+terraform validate
+```
+
+### Preview Changes
+```
+ terraform plan
+```
+
+### Apply Terraform files
+```
+  terraform apply
+```
+
+When the provisioning is complete, details will be provided about the cluster.
+
+```
+  cluster_endpoint = "https://E44319CC44678D8EE100B7C42A46AE5D.gr7.us-west-2.eks.amazonaws.com"
+  cluster_name = "education-eks-pAGhwfz9"
+  cluster_security_group_id = "sg-01f527e90fdbf2f6d"
+  region = "us-west-2"
+```
+
+### Show the current terraform state
+```
+  terraform show
+```
+
+This will also show the cluster output information
+
+
+## Configure kube for the new cluster
+
+```
+aws eks update-kubeconfig --name <clustername>
+```
+
+Update kubectl from Terraform output (from the EKS terraform directory)
+```
+aws eks update-kubeconfig --name $(terraform output -raw cluster_name)
+```
+
+
+
+## Deleting a terraform deployment
+```
+ terraform destroy
+```
+
+
+# References
+* [Terraform EKS Example](https://developer.hashicorp.com/terraform/tutorials/kubernetes/eks)
+* [Terraform Helm Example](https://developer.hashicorp.com/terraform/tutorials/kubernetes/helm-provider)
diff --git a/caseyd/aws/tf/provision-eks-cluster/.gitignore b/caseyd/aws/tf/provision-eks-cluster/.gitignore
@@ -0,0 +1,27 @@
+# Local .terraform directories
+**/.terraform/*
+
+# .tfstate files
+*.tfstate
+*.tfstate.*
+*.tfplan
+
+# Crash log files
+crash.log
+
+# Exclude all .tfvars files, which are likely to contain sentitive data, such as
+# password, private keys, and other secrets. These should not be part of version
+# control as they are data points which are potentially sensitive and subject
+# to change depending on the environment.
+*.tfvars
+
+# Ignore override files as they are usually used to override resources locally and so
+# are not checked in
+override.tf
+override.tf.json
+*_override.tf
+*_override.tf.json
+
+# Ignore CLI configuration files
+.terraformrc
+terraform.rc