Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Bind Module Reference #79

Merged
merged 51 commits into from
Mar 1, 2022

Conversation

nickumia-reisys
Copy link
Contributor

@nickumia-reisys nickumia-reisys commented Feb 18, 2022

Getting error 'Module module.brokerpak-eks-terraform-bind contains provider configuration' in ssb broker

image

relates to GSA/data.gov#3696

Getting error 'Module module.brokerpak-eks-terraform-bind contains provider configuration' in ssb broker
Module on module dependency does not allow
 the child module to specify its own provider blocks; all provider blocks must be in the root module
nickumia-reisys and others added 24 commits February 18, 2022 18:30
There's overlap between different credentials in our deployment, so use AWS_PROFILE to have finer control over which credentials are used when.. Reference: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#specifying-credentials
Since the module is being run directly from the same context as the ssb, we have to set different credentials for the space in which eks will live (which is different from the ssb backend)
This is a shift in how we are using EKS terraform from brokerpak-only to brokerpak AND terraform module
To fix 'To work with module.brokerpak-eks-terraform-provision.aws_kms_key.cluster its original provider configuration at
│ module.brokerpak-eks-terraform-provision.provider[registry.terraform.io/hashicorp/aws].dnssec-key-provider is required, but it has been removed. This occurs when a
│ provider configuration is removed while objects created by that provider still exist in the state. Re-add the provider configuration to destroy
│ module.brokerpak-eks-terraform-provision.aws_kms_key.cluster, after which you can remove the provider configuration again.'
Fix for.. Error: error configuring Terraform AWS Provider: error validating provider credentials: error calling sts:GetCallerIdentity: InvalidClientTokenId: The security token included in the request is invalid.
│       status code: 403, request id: 38dd9ba8-75e8-4b9d-93c5-dac8c5a2d534
│
│   with module.brokerpak-eks-terraform-provision.provider[registry.terraform.io/hashicorp/aws].dnssec-key-provider,
│   on .terraform/modules/brokerpak-eks-terraform-provision/terraform/provision/providers.tf line 2, in provider aws:
│    2: provider aws {
Child modules must specify required_providers and use their defined local name (if different from the providers preferred name); Reference: https://www.terraform.io/language/providers/requirements#handling-local-name-conflicts
Now that we're using the up-to-date version of the EKS module, the module is better at managing Fargate profiles than we are. So now we let it handle that.
We were previously restricted to using a version of the kubernetes provider that didn't support annotations, so we were forced to use a null_resource to set up logging. Now that we're no longer restricted, we can do this with explicit namespace and configmap resources.
Ensure this policy is applied to the roles created by the EKS module for all node_groups and fargate_profiles. This way all pod logs go to Cloudwatch.

(The policy name were also changed, since it's not only for use with Fargate pods.)
The kubernetes provider needs AWS creds set in the environment so it can use aws-iam-authenticator to connect.
By making provision and bind into submodules of a single root module that encompasses them both (in ../terraform), we can hoist their provider configurations upward into the root module. Then the root module can handle accepting the provider configuration from callers (in this case datagov-ssb's static provision+bind deployment) and pass them to the submodules. We can have the bind module depend on the provision module, which skirts Terraform restrictions that would normally prevent the bind module from declaring its own kubernetes provider.

The submodules remain usable as root-modules in other contexts. For example, the brokerpak can use provision once, then use bind multiple times for each binding.
We still can't yet uncomment/declare the bind module because of the depends_on relationship with provision.

The next step is to try to hoist the kubernetes provider out of the bind module. Ideally we'll be able to specify a kubeconfig directly using the output of the provision module.
…e account

The token is used in the kubernetes provider config for bind, removing the need for any AWS credentials or binaries.
@mogul
Copy link
Collaborator

mogul commented Feb 24, 2022

Where we're at: As of the refactoring commits tonight in this branch and in the corresponding datagov-ssb repo branch, the modules no longer have any providers in them, the root module over them both creates appropriate providers for each module and injects them. Terraform validate approves! However, when I run terraform apply it's getting stuck because the deployment of csi-ebs driver and ingress-nginx helm-release are timing out.

This is almost certainly because those pods have daemonsets and cannot run on Fargate, and the MNG group is not available to schedule. We'll need to review this series of commits tomorrow to see what I b0rked up:
image

@mogul
Copy link
Collaborator

mogul commented Feb 28, 2022

Where we're at:

  • Commenting out module.provision.module.eks' fargate_profiles configuration got everything working again.
    • We'll leave that commented out for now, though in future we may or may not want it available, depending on whether we get everything we need from auto-scaling our managed-node groups.
  • Destroys are not clean.
    • For some reason during destroy the helm provider makes use of the module.provision.kubernetes_service_account.admin service account we create in admin_account.tf, despite the helm provider explicitly not using that account when it's configured in provision-providers.tf.
      • That service account loses the cluster-admin role binding almost immediately, so Helm fails because the account doesn't have access to secrets.
      • We could make the four helm_release resources depend on that binding to get around this, but that's not solving the real problem: Why isn't the helm provider using the configured AWS creds and generated tokens!?
    • If you work around that, destroy of the helm_release ingress_nginx is sometimes getting stuck in state uninstalling. 🙄 I discovered that the helm provider has a debug=true argument that will hopefully help us figure this out once and for all.
    • And if you get around that, of course, the NLB and corresponding target groups never get deleted by alb_controller, which means the certificate doesn't get destroyed, which means the VPC can't be deleted.

@mogul mogul force-pushed the fix-for-ssb-terraform-only branch from f5d8caf to f92497c Compare March 1, 2022 00:28
@mogul
Copy link
Collaborator

mogul commented Mar 1, 2022

  • Destroys are not clean.

    • For some reason during destroy the helm provider makes use of the module.provision.kubernetes_service_account.admin service account we create in admin_account.tf, despite the helm provider explicitly not using that account when it's configured in provision-providers.tf.
      • That service account loses the cluster-admin role binding almost immediately, so Helm fails because the account doesn't have access to secrets.
      • We could make the four helm_release resources depend on that binding to get around this, but that's not solving the real problem: Why isn't the helm provider using the configured AWS creds and generated tokens!?

We figured this problem out.

@mogul mogul force-pushed the fix-for-ssb-terraform-only branch from 8d69c82 to ae504eb Compare March 1, 2022 01:38
@mogul mogul force-pushed the fix-for-ssb-terraform-only branch from ae504eb to 3fe1719 Compare March 1, 2022 07:02
@mogul
Copy link
Collaborator

mogul commented Mar 1, 2022

Lots of teensy bugs fixed along the way to this point:

$ make demo-run
Testing aws-eks-service:raw:instance-bmogilefsky3:binding
To work directly with the instance:
export KUBECONFIG=/tmp/tmp.pAidiG73VS
export DOMAIN_NAME=instance-bmogilefsky3.ssb-dev.data.gov
Running tests...
Deploying the test fixture...
deployment.apps/deployment-2048 created
service/service-2048 created
ingress.networking.k8s.io/ingress-2048 created
Waiting 3 minutes for the workload to start and the DNS entry to be created...
Testing that the ingress is resolvable via SSL, and that it's properly pointing at the 2048 app...curl: (52) Empty reply from server
make: *** [Makefile:109: demo-run] Error 1

That's the next thing to look into.

@mogul
Copy link
Collaborator

mogul commented Mar 1, 2022

Oh hey, IT PASSED! Maybe there was just some cruft left in place from my last local invocation of test with an instance of the same name that prevented it from passing locally.

@mogul mogul marked this pull request as ready for review March 1, 2022 15:13
@mogul mogul requested a review from a team March 1, 2022 15:13
Copy link
Contributor Author

@nickumia-reisys nickumia-reisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome feature add! I won't hark on it right now, but we should probably update the README or add a file in the docs dir to record the dual functionality of brokerpak or terraform module.

AMAZING WORK THOUGH!! 🍹

P.S. Alas I can't merge this PR because technically it's mine 😕

.gitignore Show resolved Hide resolved
@nickumia-reisys nickumia-reisys merged commit ed8c1d7 into main Mar 1, 2022
@nickumia-reisys nickumia-reisys deleted the fix-for-ssb-terraform-only branch March 1, 2022 16:03
robert-bryson pushed a commit to GSA/datagov-ssb that referenced this pull request Mar 3, 2022
* new: provision standalone eks cluster using terraform only

Create custom 'brokerpak-eks-terraform-provision' module using eks-brokerpak as source; specify custom parameters to provision with

* update: terraform and modules

* new: add bind resource for eks-terraform cluster

Create user-provided service based on binding to pass to solrcloud broker

* fix/refactor: bind depends on provision

* update/fix: child modules can't specify their own provider blocks

Reference: GSA-TTS/datagov-brokerpak-eks#79

* cleanup: remove unreferenced variable

* new: add eks_terraform params for staging/prod

* fix: ensure bind actually waits for cluster_functional

Explicit output from module to module

* new: install aws/kubectl/helm tools for terraform apply

* fix: don't mess up ubuntu's path

* test: attempt to fix github action path

Trying to determine where home is and where the real path actually is

* test: ensure that the aws-iam-authenticator was installed

Also tests whether the command is available later in the same job

* test: the command actually wasn't added to the path yet..

Reference: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#adding-a-system-path

* fix: high hopesgit add .

aws-iam-authenticator needs AWS Credentials in the env

* new: successfully configure providers for eks as a submodule

* revert: no need to set AWS creds

The S3 Backend is set up properly alongside terraform AWS credentials, so this broke that

* update: supply the module's required_provider using the correct alias

* fix: provide AWS creds for the module to use with kubernetes provider

The module needs AWS creds so the kubernetes provider can use aws-iam-authenticator when it connects.

* workaround: temporarily comment out the resources that aren't working

We have yet to resolve the issue with the kubernetes_provider in the bind module not working, due to it having a dependency on another module.

* refactor: move declaration of IaaS-only resources to the managed-boundary

Technically this is where they belong, in terms of our SSP and diagram terminology.

Note also that we are no longer invoking the submodules separately, but rather directly invoking the parent module at `datagov-brokerpak-eks/terraform`

* lint: fix terraform fmt

* fix: uncomment CF-side resources

* fix: name "credentials" is implicit

* update: eks-brokerpak no longer needs a branch

Dual support for brokerpak mode or terraform module mode is enabled :)

* fix: terraform string literals don't like newlines

Reference: hashicorp/terraform-provider-vault#307

* fix: use data sources for space/org lookup

* fix: set variables based on service name

update VCAP_SERVICE selection to use the correct service based on service name, not service type

* new: install tools as TERRAFORM_PRE_RUN

Since dflook action is a separate container, the tools need to be pulled in properly.  The terraform working directory becomes part of the path for that step, so installing the tools there works :)

* lint: ran terraform fmt

* new: add tools to known path

After lots of tries, this seems to be the most reliable method

* temp: work off of branch until it gets merged

Too many changes to keep track off.. the branch is fully functional for terraform only stuff, but brokerpak stuff is a bit broken :/

Co-authored-by: Bret Mogilefsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants