Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform Standalone Use Case: Hardened #4768

Open
matttrach opened this issue Sep 15, 2023 · 18 comments
Open

Terraform Standalone Use Case: Hardened #4768

matttrach opened this issue Sep 15, 2023 · 18 comments
Assignees
Labels
kind/epic Work tracking for a larger effort that requires breaking down into multiple smaller issues
Milestone

Comments

@matttrach
Copy link
Contributor

This tracks progress on satisfying a hardened RKE2 use case.

We will need to harden the OS

We will need to follow the hardening guide for RKE2: https://docs.rke2.io/security/hardening_guide

@matttrach matttrach self-assigned this Sep 15, 2023
@matttrach
Copy link
Contributor Author

The approach on this one will be to enable immutable infrastructure:

  • a hardening module for the OS
  • an RKE2 hardening module (possibly internal to the rke2-install module)
  • an imaging module for AWS
  1. Provision objects necessary to provision and configure server
  2. Provision server on AWS
  3. Harden server
  4. Install RKE2
  5. Harden RKE2
  6. Clean the install (remove anything which might be specific to the server)
  7. Generate an AMI from the server
  • The user can now move their new custom AMI to a secure region and deploy it in an air-gapped VPC

@matttrach
Copy link
Contributor Author

focus on RHEL as first hardened OS

@matttrach
Copy link
Contributor Author

The CIS Benchmarks appear to be the standard for how to achieve the hardened OS, CIS also provides custom AMIs on AWS that are pre-configured for their benchmarks. The STIG benchmark for RHEL is the one which we should use for servers. There is also a distribution independent benchmark that we might use for other server types, it contains multiple levels of suggestions, look for the "server - level 2" suggestions.

@matttrach
Copy link
Contributor Author

To harden RKE2 on Rhel8 we should be able to get by with setting the cis config as follows along with adding a user for etcd and setting the profile flag in the config.

small script to enable cis conf:

sudo cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf && \
sudo systemctl restart systemd-sysctl && \ 
sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U

example cis profile enabled rke2 config:

write-kubeconfig-mode: 644
cni: calico
cloud-provider-name: "aws"
profile: "cis-1.23"
selinux: true

This requires enabling an extra config on top of what is necessary for clustering, adding the ability to inject a script to prep the OS for running rke2 after install, but before first start.

@matttrach
Copy link
Contributor Author

Enabling the RHEL8 STIG AMI: rancher/terraform-aws-server#20

@matttrach
Copy link
Contributor Author

The changes there will need to be propagated to the install and rke2 modules and their examples.
Then we should be able to inject a script to install the selinux policies before starting rke2.

@matttrach
Copy link
Contributor Author

Propagate CIS to install module with example cis configuration: rancher/terraform-null-rke2-install#51

@matttrach
Copy link
Contributor Author

I am currently working on adding a local repo to the server to enable air-gapped rpm installs with selinux enforcing on the CIS AMI.

@matttrach
Copy link
Contributor Author

matttrach commented Oct 26, 2023

Status

  • IPv4
  • Air-gapped install
  • Rpm install
  • Private image registry https://docs.rke2.io/install/airgap?_highlight=air&_highlight=gap#private-registry-method
  • System default registry system-default-registry option to configure custom image repo
  • Private RPM repo
  • Local RPM repo
  • Rhel8-STIG
  • CIS profile enabled
  • Fips Enabled (selinux enforcing)
  • Cilium configured
  • HA cluster
  • Split role cluster
  • inject scripts to cloud init
  • disable ssh access
  • generate image from server
  • full dependency package and download for offline installation

@matttrach
Copy link
Contributor Author

the latest changes to aws-rke2 module include:

  • the ability to inject a script before start, after install
  • the ability to skip starting the service (stop after install)

Next up:

  • working on generating an image

@matttrach
Copy link
Contributor Author

matttrach commented Mar 1, 2024

Prioritizing by difficulty/time consumption:

  • Config mod for local RPM repo
  • Config mod for local image repo
  • Private RPM repo mod
  • Private Image repo mod

@matttrach
Copy link
Contributor Author

  • Local and Private RPM repos should exist in their own module.
    • Using the local rpm repo for the node is going to take generating a build server and packaging up the repo for addition to the rke2 node.
  • Generating an rpm mirror to use as a private repo is itself a project
  • Using the system default registry to store the images in the rke2 node is a configuration that will need to happen after the node is generated, I think a module could handle this fairly easily, but I don't think it should be part of the normal install process. I just think it is out of scope for the RKE2 install mod.
    • in its own module this could be added to any rke2 node, independent of how it was created
    • this could be a series of "config" mods that allow users to pick addons to their node by implementing modules
  • Generating a private registry server (eg Harbor) is its own project so it should be an independent module

@matttrach
Copy link
Contributor Author

These are not small items unfortunately, it will take me some time to get these things figured out.

In the mean time here is a repo showing how to get everything else running:
https://github.com/rancher/terraform-aws-rke2-live-example

This has a full IAC of an RKE2 node with an airgapped server that you can only access via the AWS serial console. It deploys a "prototype" server which has access to download the things it needs before shutting down and getting turned into an image.
The production server is then deployed using that image and an updated config to set the proper ip addresses and join token. The repo is set up to be fully IAC meaning that users manage their infrastructure like code artifacts in a repo, it has CI to test and automatically deploy infrastructure. Secrets are encrypted and the encryption is automatically rotated weekly. Each user has their own key to decrypt the secrets, and one exists for the CI that is not viewable without a code change.

@matttrach
Copy link
Contributor Author

State is stored encrypted in the repo, as well as all of the access necessary for the CI to deploy.
The CI is the public github runner and is completely free (3k min for a private repo, but unlimited for public, in my experience it is pretty hard to reach that 3k min using just one repo). Users don't need in-depth (or any) knowledge of terraform to use the example, but maintainers will need to understand what they are looking at to make educated changes.

@matttrach
Copy link
Contributor Author

CI access is created before every run and destroyed at the end making it very limited. CI never has access to production servers (they don't have public IP addresses).

@matttrach
Copy link
Contributor Author

I am going to move this issue to our backlog as I don't have a clear timeline.

@matttrach matttrach modified the milestones: v1.30.0+rke2r1, Backlog Mar 1, 2024
@matttrach
Copy link
Contributor Author

This now aligns with #5541.
I will make sure to update both so everyone is on the same page, but it will have the most up to date information.
I expect to implement items there into the example repo and I will add a summary here when I do.

@matttrach matttrach added the kind/epic Work tracking for a larger effort that requires breaking down into multiple smaller issues label May 30, 2024
@matttrach
Copy link
Contributor Author

Dualstack and SLE micro are being propagated through the system, next challenge is the embedded registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/epic Work tracking for a larger effort that requires breaking down into multiple smaller issues
Projects
None yet
Development

No branches or pull requests

2 participants