Table of Contents generated with DocToc
- Developing Hive
- Prerequisites
- Build and run tests
- Setting up the development environment
- Deploying with Kubernetes In Docker (kind)
- Adopting ClusterDeployments
- Writing/Testing Code
- Developing Hiveutil Install Manager
- Enable Debug Logging In Hive Controllers
- Using Serving Certificates
- Dependency management
- Running the e2e test locally
- Viewing Metrics with Prometheus
- Git
- Make
- A recent Go distribution (>=1.13)
To build and test your local changes, run:
make
To only run the unit tests:
make test
Get the sources from GitHub:
git clone https://github.com/openshift/hive.git
Kind can be used as a lightweight development environment for deploying and testing Hive. Currently, we support 0.8.1 version of Kind. The following instructions cover creating an insecure local registry (allowing for dramatically faster push/pull), and configuring your host OS, as well as the kind cluster to access it. This approch runs Hive in a container as you would in production, giving you the best coverage for manual testing.
This approach requires Docker. At present we do not have kind working with podman.
If you encounter ImagePullErrors, and your kind container cannot reach your registry container, you may be experiencing problems with Fedora (at least 32, possibly earlier as well) and Docker. You can attempt to work around this by changing the FirewallBackend in the /etc/firewalld/firewalld.conf file from nftables to iptables and restarting docker. (see this comment
Create a local insecure registry (if one does not already exist) and then a kind cluster named 'hive' to deploy to. You can create additional clusters if desired by providing a different name argument to the script.
./hack/create-kind-cluster.sh hive
NOTE: The following error will occur the first time you create your registry container, it is harmless and can be safely ignored.
Error response from daemon: container fcca36a26da5601d6453c0c53ab5909eb6ca8ffe42de0f8634dd7213f107cef0 is not connected to network kind
docker ps
should now show you a "registry" and a "hive" container running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ffddd07cd9e1 kindest/node:v1.18.2 "/usr/local/bin/entr…" 32 minutes ago Up 32 minutes 127.0.0.1:36933->6443/tcp hive-control-plane
8060ab7c8116 registry:2 "/entrypoint.sh /etc…" 2 days ago Up 2 days 0.0.0.0:5000->5000/tcp kind-registry
You should now have a kind-hive
context in your kubeconfig and set to current.
NOTE: If you do not have cfssljson
and cfssl
installed, run the following command to install, otherwise, ignore this.
go install github.com/cloudflare/cfssl/cmd/cfssljson
go install github.com/cloudflare/cfssl/cmd/cfssl
You can now build your local Hive source as a container, push to the local registry, and deploy Hive. Because we are not running on OpenShift we must also create a secret with certificates for the hiveadmission webhooks.
IMG=localhost:5000/hive:latest make docker-dev-push
DEPLOY_IMAGE=localhost:5000/hive:latest make deploy
./hack/hiveadmission-dev-cert.sh
Hive should now be running.
You can leave your registry container running indefinitely. The kind cluster can be replaced quickly as necessary:
kind delete cluster --name hive
./hack/create-kind-cluster.sh hive
It is possible to adopt cluster deployments into Hive, potentially even fake or kind clusters. This can be useful for developers who would like to work on functionality separate from actual provisioning.
To create a kind cluster and adopt:
./hack/create-kind-cluster.sh cluster1
bin/hiveutil create-cluster --base-domain=new-installer.openshift.com kind-cluster1 --adopt --adopt-admin-kubeconfig=/path/to/cluster/admin/kubeconfig --adopt-infra-id=fakeinfra --adopt-cluster-id=fakeid
NOTE: when using a kind cluster not all controllers will be functioning properly as it is not an OpenShift cluster and thus lacks some of the CRDs our controllers use. (ClusterState, RemoteMachineSet, etc)
Alternatively you can use any valid kubeconfig for live or since deleted clusters.
Deprovision will run but find nothing to delete if no resources are tagged with your fake infrastructure ID.
Our typical approach to manually testing code is to deploy Hive into your current cluster as defined by kubeconfig, scale down the relevant component you wish to test, and then run its code locally.
You can run the Hive operator using your source code using any one method from below
NOTE: assumes you have previously deployed Hive
oc scale -n hive deployment.v1.apps/hive-operator --replicas=0
make run-operator
- Build a custom Hive image from your current working dir:
$ IMG=quay.io/{username}/hive:latest make image-hive
- Publish your custom image:
$ IMG=quay.io/{username}/hive:latest make buildah-push
- Deploy with your custom image:
$ DEPLOY_IMAGE=quay.io/{username}/hive:latest make deploy
- After code changes you need to rebuild the Hive images as mentioned in step 1.
- Delete the running Hive pods using following command, so that the new pods will be running using the latest images built in the previous step.
oc delete pods --all -n hive
NOTE: assumes you have previously deployed Hive
oc scale -n hive deployment.v1.apps/hive-controllers --replicas=0
DISABLE_LEADER_ELECTION="true" HIVE_NS="hive" make run
Kind users should also specify HIVE_IMAGE="localhost:5000/hive:latest"
as the default image location cannot be authenticated to from Kind clusters, resulting in inability to launch install pods.
We use a hiveutil subcommand for the install-manager, in pods and thus in an image to wrap the openshift-install process and upload artifacts to Hive. Developing this is tricky because it requires a published image and ClusterImageSet. Instead, you can hack together an environment as follows:
- Create a ClusterDeployment, allow it to resolve the installer image, but before it can complete:
- Scale down the hive-controllers so they are no longer running:
$ oc scale -n hive deployment.v1.apps/hive-controllers --replicas=0
- Delete the install job:
$ oc delete job ${CLUSTER_NAME}-install
- Make a temporary working directory in your hive checkout:
$ mkdir temp
- Compile your hiveutil changes:
$ make build
- Set your pull secret as an env var to match the pod:
$ export PULL_SECRET=$(cat ~/pull-secret)
- Run:
/bin/hiveutil install-manager --work-dir $GOPATH/src/github.com/openshift/hive/temp --log-level=debug hive ${CLUSTER_NAME}
Scale down the Hive operator to zero
oc scale -n hive deployment.v1.apps/hive-operator --replicas=0
Edit the controller deployment to replace the info
log-level to debug
.
oc edit deployment/hive-controllers -n hive
spec:
containers:
- command:
- /opt/services/manager
- --log-level
- debug
The hiveutil command includes a utility to generate Letsencrypt certificates for use with clusters you create in Hive.
Prerequisites:
- The
certbot
command must be available and in the path of your machine. You can install it by following the instructions at: https://certbot.eff.org/docs/install.html - You must have credentials for AWS available in your command line, either by a configured
~/.aws/credentials
or environment variables (AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
).
- Ensure that the
hiveutil
binary is available (make hiveutil
) - Run:
hiveutil certificate create ${CLUSTER_NAME} --base-domain ${BASE_DOMAIN}
where CLUSTER_NAME is the name of your cluster and BASE_DOMAIN is the public DNS domain for your cluster (Defaults tonew-installer.openshift.com
)
The output of the certificate creation command will indicate where the certificate was created. You can then use the hiveutil create-cluster
command to
create a cluster that uses the certificate.
NOTE: The cluster name and domain used to create the certificate must match the name and base domain of the cluster you create.
Example:
hiveutil create-cluster mycluster --serving-cert=$HOME/mycluster.crt --serving-cert-key=$HOME/mycluster.key
If your work requires a change to the dependencies, you need to update the modules.
-
If you are upgrading an existing dependency, run
go get [module]
. If you are adding a dependency, then you should not need to do anything explicit for this step. The go tooling should pick up the dependency from the include directives that you added in code. -
Run
make vendor
to fetch changed dependencies. -
Test that everything still compiles with changed files in place by running
make
.
Refer Go modules documents for more information.
If you delete vendor directory which contain the needed {project} dependencies.
To recreate vendor directory, you can run the following command:
make vendor
For various reasons, Hive vendors the OpenShift Installer. The OpenShift installer brings in quite a few dependencies, so it is important to know the general flow of how to vendor the latest version of the OpenShift Installer.
Things to note:
- Hive vendors the installer from
@master
, NOT from@latest
. For go modules,@latest
means the latest git tag, which for the installer is not up to date. - The
go.mod
file contains a section calledreplace
. The purpose of this section is to force go modules to use a specific version of that dependency. replace
directives may need to be copied from the OpenShift Installer or possibly other Hive dependencies. In other words, any dependency may need to be pinned to a specific version.go mod tidy
is used to add (download) missing dependencies and to remove unused modules.go mod vendor
is used to copy dependent modules to the vendor directory.
The following is a basic flow for vendoring the latest OpenShift Installer. Updating Go modules can sometimes be complex, so it is likely that the flow below will not encompass everything needed to vendor the latest OpenShift Installer. If more steps are needed, please document them here so that the Hive team will know other possible things that need to be done.
Basic flow for vendoring the latest OpenShift Installer:
-
Compare the
replace
section of the OpenShift Installergo.mod
with thereplace
section of the Hivego.mod
. If the Hivereplace
section has the same module listed as the OpenShift Installerreplace
section, ensure that the Hive version matches the installer version. If it doesn't, change the Hive version to match. -
Edit
go.mod
and change the OpenShift Installerrequire
to referencemaster
(or a specific commit hash) instead of the last version. Go will change it to the version that master points to, so this is a temporary change.
github.com/openshift/installer master
- Run
make vendor
. This make target runs bothgo mod tidy
andgo mod vendor
which get the latest modules, cleanup unused modules and copy the moduels into the Hive git tree.
make vendor
- If
go mod tidy
errors with a message like the following, then check Hive's usage of that package. In this case, the Hive import is importing an old version of the API. It needs to instead import v1beta1. Fix the hive code and re-rungo mod tidy
. This may need to be done multiple times.
github.com/openshift/hive/pkg/controller/remotemachineset imports
github.com/openshift/machine-api-operator/pkg/apis/vsphereprovider/v1alpha1: module github.com/openshift/machine-api-operator@latest found (v0.2.0), but does not contain package github.com/openshift/machine-api-operator/pkg/apis/vsphereprovider/v1alpha1
- If
go mod tidy
errors with a message like the following, then check the installer's replace directives for that go module so that Hive is pulling in the same version. Re-run thego mod tidy
once the replace directive has been added or updated. This process may need to be followed several times to clean up all of the errors.
github.com/openshift/hive/pkg/controller/remotemachineset imports
github.com/openshift/installer/pkg/asset/machines/aws imports
sigs.k8s.io/cluster-api-provider-aws/pkg/apis/awsprovider/v1beta1: module sigs.k8s.io/cluster-api-provider-aws@latest found (v0.5.3, replaced by github.com/openshift/[email protected]), but does not contain package sigs.k8s.io/cluster-api-provider-aws/pkg/apis/awsprovider/v1beta1
- If
go mod tidy
errors with a message like the following, then check the installer's replace directives for that go module so that Hive is pulling in the same version. Re-run thego mod tidy
once the replace directive has been added or updated. This process may need to be followed several times to clean up all of the errors.
go: sigs.k8s.io/[email protected]: reading sigs.k8s.io/cluster-api-provider-azure/go.mod at revision v0.0.0: unknown revision v0.0.0
- If
go mod tidy
errors with a message like the following, then check the installer's replace directives to see if the replace needs to be updated. In this specific case, the replace was correct, but Hive is referring toawsproviderconfig/v1beta1
, and the module has renamed that directory toawsprovider/v1beta1
. Fix the Hive code and re-rungo mod tidy
github.com/openshift/hive/cmd/manager imports
sigs.k8s.io/cluster-api-provider-aws/pkg/apis/awsproviderconfig/v1beta1: module sigs.k8s.io/cluster-api-provider-aws@latest found (v0.5.3, replaced by github.com/openshift/[email protected]), but does not contain package sigs.k8s.io/cluster-api-provider-aws/pkg/apis/awsproviderconfig/v1beta1
- Once
go mod vendor
succeeds, runmake
to ensure everything builds and test correctly:
make
- If
make
errors, that may mean that Hive code needs to be updated to be compatible with the latest vendored code. Fix the Hive code and re-runmake
The e2e test deploys Hive on a cluster, tests that all Hive components are working properly, then creates a cluster with Hive and ensures that Hive works properly with the installed cluster. It finally tears down the created cluster.
You can run the e2e test by pointing to your own cluster (via the KUBECONFIG
environment variable).
Ensure that the following environment variables are set:
-
KUBECONFIG
- Must point to a valid Kubernetes configuration file that allows communicating with your cluster. -
AWS_ACCESS_KEY_ID
- AWS access key for your AWS account -
AWS_SECRET_ACCESS_KEY
- AWS secret access key for your AWS account -
HIVE_IMAGE
- Hive image to deploy to the cluster -
RELEASE_IMAGE
- OpenShift release image to use for the e2e test cluster -
CLUSTER_NAMESPACE
- Namespace where clusterdeployment will be created for the e2e test -
BASE_DOMAIN
- DNS domain to use for the test cluster (a corresponding Route53 public zone must exist on your account) -
ARTIFACT_DIR
- Directory where logs will be placed by the e2e test -
SSH_PUBLIC_KEY_FILE
- Path to a public ssh key to use for the test cluster -
PULL_SECRET_FILE
- Path to file containing a pull secret for the test cluster
For example values for these variables, see hack/local-e2e-test.sh
Run the Hive e2e script:
hack/e2e-test.sh
Hive publishes a number of metrics that can be scraped by prometheus. If you do not have an in-cluster prometheus that can scrape hive's endpoint, you can deploy a stateless prometheus pod in the hive namespace with:
oc apply -f config/prometheus/prometheus-configmap.yaml
oc apply -f config/prometheus/prometheus-deployment.yaml
oc port-forward svc/prometheus -n hive 9090:9090
Once the pods come up you should be able to view prometheus at http://localhost:9090.
Hive metrics have a hive_ or controller_runtime_ prefix.
Note that this prometheus uses an emptyDir volume and all data is lost on pod restart. You can instead use the deployment yaml with pvc if desired:
oc apply -f config/prometheus/prometheus-deployment-with-pvc.yaml
Enable CPU profiling by importing the pprof module in cmd/manager/main.go:
_ "net/http/pprof"
Launch an http server to expose the data in the main method of cmd/manager/main.go:
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
Port 6060 is already exposed in the hive-controllers service, forward it locally:
$ oc port-forward svc/hive-controllers -n hive 6060
Visit the webUI to view available profiles and some live data.
Grab a profile snapshot with curl:
$ curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=300" > cpu.pprof
Display some text data on the snapshot:
$ go tool pprof --text cpu.pprof