Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add wiz manifests #8679

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from
Open

feat: add wiz manifests #8679

wants to merge 12 commits into from

Conversation

vinaythupili
Copy link
Collaborator

-- add wiz manifests

@vinaythupili
Copy link
Collaborator Author

👍

cluster/manifests/wiz/sensor/clusterrole.yaml Outdated Show resolved Hide resolved
cluster/manifests/wiz/sensor/clusterrole.yaml Outdated Show resolved Hide resolved
cluster/manifests/wiz/sensor/clusterrole.yaml Outdated Show resolved Hide resolved
cluster/manifests/wiz/sensor/dameonset.yaml Outdated Show resolved Hide resolved
cluster/manifests/wiz/sensor/dameonset.yaml Outdated Show resolved Hide resolved
cluster/manifests/wiz/sensor/dameonset.yaml Show resolved Hide resolved
@zaklawrencea zaklawrencea added major Major feature changes or updates, e.g. feature rollout to a new country, new API calls. do-not-merge labels Dec 19, 2024
@linki
Copy link
Member

linki commented Dec 19, 2024

We should also populate the deletions.yaml. Otherwise, the ConfigItem cannot be used to turn it off again.

allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
image: "wiziopublic.azurecr.io/wiz-app/wiz-broker:2.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our environment all images need to come from container-registry.zalando.net.

Copy link
Member

@demonCoder95 demonCoder95 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to create an internal repository that "republishes" this public image to our internal ECR registry and then use the link as Martin suggests here. I communicated in our internal chat thread.

"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rollme.wizApiTokenHash: ce8124bc1b0fbc0cb5cd47338ca0c7d5f5446d79936e443a201d96b192a7bd65
rollme.proxyHash: 9aa53d69075371b3fa23ebeea2fd2416ea81fb533499d071ca2d576f17c7c886
rollme.brokerHash: 115ba85431eeaf8db3ff2173aee02d16e67df1555d5e1ef74cfa7ac0d812cab2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values look like they are computed based on the configuration value in order to trigger an update when they change. Having these values static here doesn't provide any value.

We do have similar functionality which can be used if it's really needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented the same on the rollme annotation on the connector deployment manifest above. It appears this is used to keep track of changes in the deployment manifests, not in the configuration. So, we probably don't need this whole setup anyway. Maybe it's a helm thing? CLM does this job for us.

type: Opaque
stringData:
clientId: "{{ .Cluster.ConfigItems.wiz_api_client_id | base64 }}"
clientToken: "{{ .Cluster.ConfigItems.wiz_api_client_token | base64 }}"
Copy link
Member

@linki linki Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using stringData, then base64 encoding might not be needed, just fyi if you see issues. https://kubernetes.io/docs/concepts/configuration/secret/#restriction-names-data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinaythupili now that I think about it, the values are already base64 encoded before the encryption is done. I think we meant to do base64Decode here originally, but since the secret can be base64 anyway, we probably don't need this encoding and can just use the data field instead of stringData.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to data 👍

Copy link
Member

@demonCoder95 demonCoder95 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the base64 template function, now that we're using the data field that's already base64 encoded when the secret object will be stored? (That's what I understand from Zak's comment above)

@vinaythupili
Copy link
Collaborator Author

We should also populate the deletions.yaml. Otherwise, the ConfigItem cannot be used to turn it off again.

updated 👍

Signed-off-by: Katyanna Moura <[email protected]>
- name: wiz-cluster-reader
kind : ClusterRoleBinding
namespace: wiz
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing empty line at the end 😅

Trigger deletions when daemonset is disabled.

Co-authored-by: Zak Lawrence A <[email protected]>
Comment on lines +5 to +18
# apiVersion: rbac.authorization.k8s.io/v1
# kind: ClusterRole
# metadata:
# name: wiz-cluster-reader
# labels:
# helm.sh/chart: wiz-kubernetes-connector-3.1.1
# app.kubernetes.io/name: wiz-kubernetes-connector
# app.kubernetes.io/instance: wiz-connector
# app.kubernetes.io/version: "2.5"
# app.kubernetes.io/managed-by: Helm
# rules:
# - apiGroups: ["*"]
# resources: ["*"]
# verbs: ["get", "list", "watch"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this manifest here? I think the comment is sufficient to communicate that we don't need a global reader role 🤔

Comment on lines +25 to +30
labels:
helm.sh/chart: wiz-kubernetes-connector-3.1.1
app.kubernetes.io/name: wiz-kubernetes-connector
app.kubernetes.io/instance: wiz-connector
app.kubernetes.io/version: "2.5"
app.kubernetes.io/managed-by: Helm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these labels important for the Wiz deployment? I think we might need to remove the ones related to helm, also we need to provide the Zalando environment labels like application: foo and component: bar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a similar concern was raised by Martin in another manifest, so let's fix the labels for all manifests 🙂

  1. Drop the helm related labels.
  2. Add application and component labels.

runAsNonRoot: true
runAsUser: 1000
image: "wiziopublic.azurecr.io/wiz-app/wiz-broker:2.5"
imagePullPolicy: Always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagePullPolicy should be IfNotPresent.

Images pulled are always cached by the runtime and reusing the cached image also results in faster pod startup times. This is only set to Always in cases where you expect frequent image updates, which is not the case for our infrastructure.

Comment on lines +63 to +64
- name: WIZ_ENV
value:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to create an empty environment variable?

Comment on lines +57 to +59
args: [
/etc/connectorData/data
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot cleaner to specify a list with - instead of square brackets like this, like

args:
  - /etc/connectorData/data

We use the same format in all of the repository, e-g see: https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/manifests/role-sync-controller/cronjob.yaml#L31

Comment on lines +25 to +26
annotations:
rollme: "Cd4Gg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this annotation? Is it to control rolling updates? We already have the CLM perform updates upon manifest changes made in the repository.

Comment on lines +29 to +30
app.kubernetes.io/name: wiz-kubernetes-connector
app.kubernetes.io/instance: wiz-connector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need to change these labels to application: wiz and component: connector everywhere. This is how we manage applications and components in our infrastructure.

Comment on lines +53 to +54
image: "wiziopublic.azurecr.io/wiz-app/wiz-broker:2.5"
imagePullPolicy: Always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments here to update the image URL and imagePullPolicy.

command:
- "wiz-broker"
args:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line

template:
metadata:
labels:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line.

Comment on lines +141 to +142
image: "wiziopublic.azurecr.io/wiz-app/wiz-broker:2.5"
imagePullPolicy: Always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion here to update this

Comment on lines +167 to +168
- name: WIZ_ENV
value: ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need an empty env var?

Comment on lines +26 to +29
resourceNames: [
"wiz-api-token",
"wiz-cluster-reader-token",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, let's use the - format when listing values on multiple lines:

resourceNames:
  - wiz-api-token
  - wiz-cluster-reader-token

- >
wiz-broker delete-kubernetes-connector
--input-secrets-namespace
"default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this namespaces should be wiz and not default.

I'm looking at the secrets.yaml file below and it's deploying the secret in the wiz namespace.

Comment on lines +1 to +4
apiVersion: v1
kind: Namespace
metadata:
name: wiz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to rename the file to 001-namespace.yaml to ensure that CDP deploys it before any other manifest. Otherwise, you will run into issues. Similarly, you want to ensure that the RBAC resources are deployed before any deployment/job objects are deployed.

You can see how we do it for karpenter for inspiration on how to manage precedence of manifest application: https://github.com/zalando-incubator/kubernetes-on-aws/tree/dev/cluster/manifests/z-karpenter

metadata:
name: wiz-sensor
labels:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line should be removed

Comment on lines +52 to +56
- matchExpressions:
- key: dedicated
operator: NotIn
values:
- skipper-ingress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have skipper and skipper-ingress-redis labelled nodes that we will have to skip if we're skipping skipper-ingress.

We also have other "system" dedicated nodes like zmon-worker-arm64, zmon-redis and zmon-appliance labelled nodes for ZMON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we are scoping the Wiz system only to "application worker nodes", please correct me if this isn't necessarily the case.

Comment on lines +78 to +79
image: wizio.azurecr.io/sensor:v1
imagePullPolicy: Always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: this needs updating, as suggested above

Comment on lines +63 to +64
- key: node.kubernetes.io/not-ready
operator: Exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see for zalando.org/node-not-ready taint below we added effect: NoSchedule. We probably need the same behavior for the node.kubernetes.io/not-ready taint as well?

Comment on lines +113 to +114
- name: HELM_CHART_COMPAT_VER
value: "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to drop this ENVVAR.

Comment on lines +121 to +122
- name: LOG_FILE
value: "/wiz-sensor-store/sensor.log"
Copy link
Member

@demonCoder95 demonCoder95 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below the tmp-store emptyDir type volume that is backing the /wiz-sensor-store directory is only 100Mi in size. This might not be suitable for logs as I see the log level set to info. We probably need a bigger volume here.

I see you specified "Memory" backend. From the Kubernetes docs:

While tmpfs is very fast be aware that, unlike disks, files you write count against the memory limit of the container that wrote them

I suggest you can just drop the "Memory" backend and use a disk backed emptyDir volume that's not "too large", to rely on the root FS, at the same time not exhaust the root disk of the node where it runs.

We might want to make a note of monitoring the log volume Wiz generates once we roll it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I see the logging-destination annotation missing on the pod templates in the application manifests, please add that. Example: https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/manifests/z-karpenter/deployment.yaml#L28

This will ensure that the Wiz logs are sent to Scalyr as well. This is a built-in feature provided in the infrastructure and the necessary information required to setup the connectivity with the Scalyr account is already injected to the pod when it's admitted, so you just need to provide this annotation and logs should show up in Scalyr.

That should make it safe to have a somewhat smaller volume for logging locally to the disk, since logs will be pushed to Scalyr already.

Comment on lines +143 to +152
- name: CRI_SOCKET_CUSTOM_PATH
value:
- name: HTTP_PROXY_URL
value:
- name: HTTP_PROXY_USERNAME
value:
- name: HTTP_PROXY_PASSWORD
value:
- name: HTTP_PROXY_CERT
value:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just drop these empty ENVVARs?

Comment on lines +19 to +20
clientId: "{{ .Cluster.ConfigItems.wiz_api_client_id | base64 }}"
clientToken: "{{ .Cluster.ConfigItems.wiz_api_client_token | base64 }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the same applies here, do we need to drop the base64 template function?

Comment on lines +22 to +38
# Source: wiz-sensor/templates/imagepullsecret.yaml
apiVersion: v1
kind: Secret
type: kubernetes.io/dockerconfigjson
metadata:
name: wiz-sensor-imagepullkey
labels:

helm.sh/chart: wiz-sensor-1.0.4760
image/tag: v1
app.kubernetes.io/name: wiz-sensor
app.kubernetes.io/instance: wiz-sensor
app.kubernetes.io/version: "1.0.4708"
app.kubernetes.io/managed-by: Helm
namespace: wiz
data:
.dockerconfigjson: "{{ .Cluster.ConfigItems.wiz_sensor_dockerconfigjson }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange 😅 I suggest we do the following:

  1. Instead of a Secret, make this manifest a ConfigMap that keeps the docker config JSON.
  2. We expose only the token itself as a config-item, instead of wiz_sensor_dockerconfigjson. (or username and token both can be config-items)
  3. We populate the token in the Docker config JSON using that config-item in the ConfigMap instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge major Major feature changes or updates, e.g. feature rollout to a new country, new API calls.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants