RedHatInsights · adamrdrew · Oct 24, 2024 · Oct 16, 2024 · Oct 24, 2024
diff --git a/controllers/reconcile.go b/controllers/reconcile.go
@@ -325,6 +325,14 @@ func (r *FrontendReconciliation) populateCacheBustContainer(j *batchv1.Job) erro
 	// Add the restart policy
 	j.Spec.Template.Spec.RestartPolicy = v1.RestartPolicyNever
 
+	annotations := j.Spec.Template.ObjectMeta.Annotations
+	if annotations == nil {
+		annotations = make(map[string]string)
+	}
+	annotations["kube-linter.io/ignore-all"] = "we don't need no any checking"
+
+	j.Spec.Template.ObjectMeta.SetAnnotations(annotations)
+
 	// Add the akamai edgerc configmap to the deployment
 
 	return nil

diff --git a/docs/antora/modules/ROOT/pages/api_reference.adoc b/docs/antora/modules/ROOT/pages/api_reference.adoc
@@ -420,6 +420,8 @@ do this in epehemeral environments but not in production + |  |
 | *`akamaiCacheBustImage`* __string__ | Set Akamai Cache Bust Image + |  | 
 | *`akamaiCacheBustURL`* __string__ | Set Akamai Cache Bust URL that the files will hang off of + |  | 
 | *`akamaiSecretName`* __string__ | The name of the secret we will use to get the akamai credentials + |  | 
+| *`targetNamespaces`* __string array__ | List of namespaces that should receive a copy of the frontend configuration as a config map +
+By configurations we mean the fed-modules.json, navigation files, etc. + |  | 
 |===
 
 

diff --git a/docs/antora/slos/frontend-operator-availability.md b/docs/antora/slos/frontend-operator-availability.md
@@ -0,0 +1,29 @@
+# Frontend Operator Availability SLO
+
+## Description
+
+The frontend operator availability SLO determines if the operator is functioning normally.
+This SLO tracks the deployment number of the frontend operator. There should always be at least
+1 deployment running for the operator.
+
+## SLI Rationale
+Availability is the most important metric we can gather for this operator. If there are no running
+pods, no operations can be conducted. Ensuring that we monitor the availability of the operator is
+critical to running ConsoleDot Frontends.
+
+## Implmentation
+
+The SLI for availability is enabled through kubernetes metrics. We can use the `kube_deployment_status_replicas_available`
+and filter on the `frontend-operator-system` namespace to determine if we have a running pod. Since
+the only thing running in that namespace is the controller operator, we can match our desired pod numbers to our alerts.
+
+## SLO Rationale
+
+The operator's uptime should be at least 99%. Availability is the basis of OpenShift deployments. We cannot reconcile
+Frontend resources without a running operator and it is a critical part of our deployment strategy for ConsoleDot.
+
+## Alerting
+
+Alerts for availability are high for now, but could become paged alerts in the future. When the operator becomes
+unavailable, it will not delete or remove any resources. Instead, no changes can be made to CRs on the cluster.
+While no destructive processes will be invoked, no changes can be made to frontend resources.
diff --git a/docs/antora/slos/frontend-operator-reconciliation-time.md b/docs/antora/slos/frontend-operator-reconciliation-time.md
@@ -0,0 +1,25 @@
+# Frontend Operator Reconciliation Time SLO
+
+## Description
+
+This metric tracks the reconciliation time for the Frontend Operator's `frontend` controller. Reconcilations should stay  
+below 4 seconds for at least 95% of the time.
+
+## SLI Rationale
+
+High reconciliation times backup the queue of objects needed to be reconciled. This could indicate an issue with the operator  
+and prevent objects from getting added or updated in a timely manor. 
+
+## Implmentation
+
+The Operator SDK exposes the `controller_runtime_reconcile_time_seconds_bucket` metric to show reconciliation times. Using the
+`sum(average_over_time)` modifier allows us to determine if that amount is staying under 4 seconds.
+
+## SLO Rationale
+Almost all reconciler calls should be handled without issue in a timely manor. If we are hitting reconciliation times greater than  
+4 seconds, debugging should begin.
+
+## Alerting
+Alerts should be kept to a medium level. Because there are a myriad of issues that could cause high reconciliation times, breaking
+this SLO should not result in a page. It should be addressed, but higher than normal reconciliation times alone does not indiciate   
+an outage.
diff --git a/docs/antora/slos/frontend-operator-reconciliation.md b/docs/antora/slos/frontend-operator-reconciliation.md
@@ -0,0 +1,26 @@
+# Frontend Operator Reconciliation SLO
+
+## Description
+
+The frontend operator implements metrics to expose the error rate of reconcilations targeting its CRDs. When that
+error rate is too high, we will alert. Reconciliation errors can indicate a wide array of issue including misconfigurations,
+outages, and quota/resource constraints. In general, if the operator's cannot reconcile successfully at a nominal rate,
+investigation is needed.
+
+## SLI Rationale
+
+Reconciliation error rates show many different issues across several environments. We can use this metric to catch
+misconfigurations in production apps and find deploy time issues with the operator. 
+
+## Implmentation
+
+The Operator SDK exposes the `controller_runtime_reconcile_total` metric to show the nominal reconcilation rate. Using the
+`sum(increase)` modifier allows us to determine if that amount is less than 100%.
+
+## SLO Rationale
+Almost all reconciler calls should be handled without issue. If we are hitting more than 10% errors on reconcile, debugging
+should begin.
+
+## Alerting
+Alerts should be kept to a medium level. Because there are a myriad of issues that could cause a reconciliation error, breaking
+this SLO should not result in a page. It should be addressed, but error rate alone does not indiciate an outage.