Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧟 Revive the Dead #112

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from
Draft

🧟 Revive the Dead #112

wants to merge 27 commits into from

Conversation

nickumia-reisys
Copy link
Contributor

@nickumia-reisys nickumia-reisys commented Oct 5, 2023

Related to

Checklist:

  • Fix double terraform apply requirement
  • Review this PR as a team and make sure everyone is aware of all changes
  • Reach out to @nickumia if there are any open questions about design choices
  • Are the changes breaking?
    • If yes, check with active members/maintainers/users in #cg-contributors
    • If no, great job! 🥳
  • Are new tests required (and added) for changed resources?
  • If fargate is working, do we really want to delete managed node groups?? If we ever need to switch back again, we'll have to revive work in the same way that this PR did to revive fargate 😖
    • Oooo I think the best way to "consolidate" this is to offer TWO PLANS!! We never really had a use-case for that before this 😲

Background:
In an attempt to leverage k8s for airflow development, the datagov-brokerpak-eks repo needs to be on a supported version of k8s. At the time of writing, there are 130 resources managed by this terraform code. Most of it is documented in our cleanup documentation. These 130 resources are managed through 57 terraform resources and 4 terraform modules (eks, aws_load_balancer_controller, vpc and iam_assumable_role_karpenter). Within these resources, Helm charts are used to create many more intangible resources inside of k8s. Referencing all of the documentation and understanding how to make updates to this is important and requires time and effort.

Many resources were only included because we knew it was dependency, either through direct documentation or trial and error. I'm trying to cover everything before 10/12; however, I suspect that there will be additional work to do. See notes below on the status of current efforts.

Notes:

  • If there is an EKS cluster provisioned already AND you change tags on the module.eks resource, you MUST delete the current cluster and reprovision from scratch, since the tags interfere with talking to the kubernetes control plane.
  • How to locate the ALB Controller Helm Release:
    image

Misc Notes:

References:

- We only have the GSA ISE AMI for K8S v1.21... since AWS doesn't allow us to create anything older than v1.23, we need to request new AMIs from GSA ISE, related issues:
  - GSA/data.gov#3812
  - GSA/data.gov#3808
  - GSA/data.gov#3761
- Fargate is a lot easier to use for @Jin-Sun-tts's proof of concept :)
we're going to try to use just fargate and then maybe do a combo support of fargate + managed node groups with variable inputs
might have breaking changes to consider, primarily aws provider
- Description: The ID of the EKS cluster. Note: currently a value is returned only for local EKS clusters created on Outposts
- Description: The name of the EKS cluster
This was maybe defaulting to true in an older version of eks module.. I think this configuration is safe because we are specifying a subset of public cidrs to allow access
- k8s v1.28 utilized newer version of the networking api, which forced a newer version of the alb controller (and by extension, the helm chart that deploys that controller)
- ANSWER: kubernetes-sigs/aws-load-balancer-controller#2495 (comment)
- This is a temp fix on a PR until the full initial revamp is complete
- ERROR: │ Error: unable to build kubernetes objects from release manifest: resource mapping not found for name: 'ingress-nginx-controller' namespace: 'kube-system' from '': no matches for kind 'HorizontalPodAutoscaler' in version 'autoscaling/v2beta2'
│ ensure CRDs are installed first
- Solution: took a chance and just updated the chart version
- Revive Fargate configuration for karpenter + EFS from old commits
- Update calico ... v3.26.2

adding missing files
 The data resource should not need to depend on anything.. maybe the issue is changes within the eks module... the resources aren't properly linked anymore.. although... the better fix might be using the module.eks variable and see if there have been improvements that make the cluster referable directly from there??
@nickumia-reisys nickumia-reisys linked an issue Oct 11, 2023 that may be closed by this pull request
69 tasks
This seems like the better answer to the last commit concern
The data is base64encoded already, so just decode it and it is the PEM data itself
This looks to be working.  It is an upgrade because the autoscaler-provisioner is deprecated.  It might not be working because of fargate configuration that still needs some fine-tuning (in terms of spinning up new nodes).  However, the helm chart deploys.  I haven't hit a blocker that is definitive that we can't use fargate.  This requires more investigation.
References:
- https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup
- https://github.com/kubernetes/autoscaler/blob/master/charts/cluster-autoscaler/Chart.yaml
- https://artifacthub.io/packages/helm/cluster-autoscaler/cluster-autoscaler
- Calico and Starboard don't work, so just comment them out so that terraform can apply mostly cleanly.  There is still an issue about running terrraform apply.  I think terraform appy needs to be run multiple times because coredns cannot provision effectively because fargate isn't ready??  It should be some depends_on to be added someplace, but mentioning since it's important for brokerpak operations
- Lots of design choices to be made.  The hardest thing is iteration speed and having the time, energy and resources to do the work.  However, I'll actively make comments on things to try if people get stuck working on this.
Moment of truth with this push haha..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Learn How to Operate in the Brokerpak/Broker/SSB World
1 participant