Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Policy rke2-flannel-host-networking when cis-1.23 and calico #5315

Closed
albundy83 opened this issue Jan 23, 2024 · 16 comments
Closed

Network Policy rke2-flannel-host-networking when cis-1.23 and calico #5315

albundy83 opened this issue Jan 23, 2024 · 16 comments
Assignees
Labels

Comments

@albundy83
Copy link

Environmental Info:
RKE2 Version:

v1.26.12+rke2r1

Node(s) CPU architecture, OS, and Version:

outscale@ip-10-0-1-10:~$ uname -a
Linux ip-10-0-1-10 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 servers, 3 agents

Installed with lablabs ansible-role release 1.28.0

Describe the bug:
When I enable cis-1.23 profile and use calico as cni, I have the following network policy that is created in namespace kube-system kube-public and default

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: rke2-flannel-host-networking
  namespace: kube-system
spec:
  podSelector: {}
  ingress:
    - {}
  policyTypes:
    - Ingress

Steps To Reproduce:

  • Installed RKE2:
---
# The node type - server or agent
rke2_type: server

# Deploy the control plane in HA mode
rke2_ha_mode: true

# Install and configure Keepalived on Server nodes
# Can be disabled if you are using pre-configured Load Balancer
rke2_ha_mode_keepalived: false

# Install and configure kube-vip LB and VIP for cluster
# rke2_ha_mode_keepalived needs to be false
rke2_ha_mode_kubevip: false

# Kubernetes API and RKE2 registration IP address. The default Address is the IPv4 of the Server/Master node.
# In HA mode choose a static IP which will be set as VIP in keepalived.
# Or if the keepalived is disabled, use IP address of your LB.
rke2_api_ip: my-youpi.com

# optional option for RKE2 Server to listen on a private IP address on port 9345
# rke2_api_private_ip:

# optional option for kubevip IP subnet
# rke2_api_cidr: 24

# optional option for kubevip
# rke2_interface: eth0
# optional option for IPv4/IPv6 addresses to advertise for node
# rke2_bind_address: "{{ hostvars[inventory_hostname]['ansible_' + rke2_interface]['ipv4']['address'] }}"

# kubevip load balancer IP range
rke2_loadbalancer_ip_range: {}
#  range-global: 192.168.1.50-192.168.1.100
#  cidr-finance: 192.168.0.220/29,192.168.0.230/29

# Install kubevip cloud provider if rke2_ha_mode_kubevip is true
rke2_kubevip_cloud_provider_enable: true

# Enable kube-vip to watch Services of type LoadBalancer
rke2_kubevip_svc_enable: true

# Specify which image is used for kube-vip container
rke2_kubevip_image: ghcr.io/kube-vip/kube-vip:v0.6.4

# Specify which image is used for kube-vip cloud provider container
rke2_kubevip_cloud_provider_image: ghcr.io/kube-vip/kube-vip-cloud-provider:v0.0.4

# (Optional) A list of kube-vip flags
# All flags can be found here https://kube-vip.io/docs/installation/flags/
# rke2_kubevip_args: []
# - param: lb_enable
#   value: true
# - param: lb_port
#   value: 6443

# Add additional SANs in k8s API TLS cert
rke2_additional_sans: []

# API Server destination port
rke2_apiserver_dest_port: 6443

# Server nodes taints
rke2_server_node_taints:
  - 'CriticalAddonsOnly=true:NoExecute'

# Agent nodes taints
rke2_agent_node_taints: []

# Pre-shared secret token that other server or agent nodes will register with when connecting to the cluster
rke2_token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# RKE2 version
rke2_version: v1.26.12+rke2r1

# URL to RKE2 repository
rke2_channel_url: https://update.rke2.io/v1-release/channels

# URL to RKE2 install bash script
# e.g. rancher chinase mirror http://rancher-mirror.rancher.cn/rke2/install.sh
rke2_install_bash_url: https://get.rke2.io

# Local data directory for RKE2
rke2_data_path: /var/lib/rancher/rke2

# Default URL to fetch artifacts
rke2_artifact_url: https://github.com/rancher/rke2/releases/download/

# Local path to store artifacts
rke2_artifact_path: /rke2/artifact

# Airgap required artifacts
rke2_artifact:
  - sha256sum-{{ rke2_architecture }}.txt
  - rke2.linux-{{ rke2_architecture }}.tar.gz
  - rke2-images.linux-{{ rke2_architecture }}.tar.zst

# Changes the deploy strategy to install based on local artifacts
rke2_airgap_mode: false

# Airgap implementation type - download, copy or exists
# - 'download' will fetch the artifacts on each node,
# - 'copy' will transfer local files in 'rke2_artifact' to the nodes,
# - 'exists' assumes 'rke2_artifact' files are already stored in 'rke2_artifact_path'
rke2_airgap_implementation: download

# Local source path where artifacts are stored
rke2_airgap_copy_sourcepath: local_artifacts

# Tarball images for additional components to be copied from rke2_airgap_copy_sourcepath to the nodes
# (File extensions in the list and on the real files must be retained)
rke2_airgap_copy_additional_tarballs: []

# Destination for airgap additional images tarballs ( see https://docs.rke2.io/install/airgap#tarball-method )
rke2_tarball_images_path: "{{ rke2_data_path }}/agent/images"

# Architecture to be downloaded, currently there are releases for amd64 and s390x
rke2_architecture: amd64

# Destination directory for RKE2 installation script
rke2_install_script_dir: /var/tmp

# RKE2 channel
rke2_channel: stable

# Do not deploy packaged components and delete any deployed components
# Valid items: rke2-canal, rke2-coredns, rke2-ingress-nginx, rke2-metrics-server
rke2_disable:

# Option to disable kube-proxy
disable_kube_proxy: false

# Path to custom manifests deployed during the RKE2 installation
# It is possible to use Jinja2 templating in the manifests
rke2_custom_manifests:
  - templates/default_manifests/rke2-ingress-nginx-proxy-protocol.yaml
  - templates/default_manifests/cert-manager.yaml

# Path to static pods deployed during the RKE2 installation
rke2_static_pods:

# Configure custom Containerd Registry
rke2_custom_registry_mirrors: []
  # - name:
  #   endpoint: {}
#   rewrite: '"^rancher/(.*)": "mirrorproject/rancher-images/$1"'

# Configure custom Containerd Registry additional configuration
rke2_custom_registry_configs: []
#   - endpoint:
#     config:

# Path to Container registry config file template
rke2_custom_registry_path: templates/registries.yaml.j2

# Path to RKE2 config file template
rke2_config: templates/config.yaml.j2

# Path to Kubernetes event rate limit admission controller config file
rke2_kube_rate_limit_config_dir: templates/api_rate_limit

# Etcd snapshot source directory
rke2_etcd_snapshot_source_dir: etcd_snapshots

# Etcd snapshot file name.
# When the file name is defined, the etcd will be restored on initial deployment Ansible run.
# The etcd will be restored only during the initial run, so even if you will leave the the file name specified,
# the etcd will remain untouched during the next runs.
# You can either use this or set options in `rke2_etcd_snapshot_s3_options`
rke2_etcd_snapshot_file:

# Etcd snapshot location
rke2_etcd_snapshot_destination_dir: "{{ rke2_data_path }}/server/db/snapshots"

# Etcd snapshot s3 options
# Set either all these values or `rke2_etcd_snapshot_file` and `rke2_etcd_snapshot_source_dir`

# rke2_etcd_snapshot_s3_options:
  # s3_endpoint: "" # required
  # access_key: "" # required
  # secret_key: "" # required
  # bucket: "" # required
  # snapshot_name: "" # required.
  # skip_ssl_verify: false # optional
  # endpoint_ca: "" # optional. Can skip if using defaults
  # region: "" # optional - defaults to us-east-1
  # folder: "" # optional - defaults to top level of bucket
# Override default containerd snapshotter
rke2_snapshooter: overlayfs

# Deploy RKE2 with default CNI canal
rke2_cni: calico

# Validate system configuration against the selected benchmark
# (Supported value is "cis-1.23" or eventually "cis-1.6" if you are running RKE2 prior 1.25)
rke2_cis_profile: "cis-1.23"

# Download Kubernetes config file to the Ansible controller
rke2_download_kubeconf: true

# Name of the Kubernetes config file will be downloaded to the Ansible controller
rke2_download_kubeconf_file_name: rke2.yaml

# Destination directory where the Kubernetes config file will be downloaded to the Ansible controller
rke2_download_kubeconf_path: /tmp

# Default Ansible Inventory Group name for RKE2 cluster
rke2_cluster_group_name: k8s_cluster

# Default Ansible Inventory Group name for RKE2 Servers
rke2_servers_group_name: masters

# Default Ansible Inventory Group name for RKE2 Agents
rke2_agents_group_name: workers

# (Optional) A list of Kubernetes API server flags
# All flags can be found here https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver
rke2_kube_apiserver_args:
  - enable-admission-plugins=NodeRestriction,AlwaysPullImages,EventRateLimit
  - admission-control-config-file=/etc/rancher/rke2/rate_limit_admission_controller.yaml
  - audit-log-path=/var/log/audit/rke2.log
  - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  - runtime-config=api/all=true
# (Optional) List of Node labels
# k8s_node_label: []

# (Optional) Additional RKE2 server configuration options
# You could find the flags at https://docs.rke2.io/reference/server_config
rke2_server_options:
  - "cloud-provider-name: external"
  - "disable-cloud-controller: true"
  - "etcd-expose-metrics: true"
  - "kube-proxy-arg: ['metrics-bind-address=0.0.0.0']"
  - "kubelet-arg: ['read-only-port=10255']"

# (Optional) Additional RKE2 agent configuration options
# You could find the flags at https://docs.rke2.io/reference/linux_agent_config
rke2_agent_options:
  - "kube-proxy-arg: ['metrics-bind-address=0.0.0.0']"
  - "kubelet-arg: ['read-only-port=10255']"
# (Optional) Configure Proxy
# All flags can be found here https://docs.rke2.io/advanced#configuring-an-http-proxy
# rke2_environment_options: []
#   - "option=value"
#   - "HTTP_PROXY=http://your-proxy.example.com:8888"

# (Optional) Customize default kube-controller-manager arguments
# This functionality allows appending the argument if it is not present by default or replacing it if it already exists.
rke2_kube_controller_manager_arg:
  - "bind-address=0.0.0.0"

# (Optional) Customize default kube-scheduler arguments
# This functionality allows appending the argument if it is not present by default or replacing it if it already exists.
rke2_kube_scheduler_arg:
  - "bind-address=0.0.0.0"

# Cordon, drain the node which is being upgraded. Uncordon the node once the RKE2 upgraded
rke2_drain_node_during_upgrade: false

# Wait for all pods to be ready after rke2-service restart during rolling restart.
rke2_wait_for_all_pods_to_be_ready: true

# Enable debug mode (rke2-service)
rke2_debug: false

Expected behavior:
Can you explain me if it's normal ?

Also, I'm not able to delete it, as "something" recreate it ...

Actual behavior:

Additional context / logs:

@brandond
Copy link
Member

brandond commented Jan 24, 2024

Yes, it's normal. Ref:

Do you have any further questions about it?

@albundy83
Copy link
Author

albundy83 commented Jan 24, 2024

Hello,
yes my first issue is maybe using flannel in name is a bit confusing as I use only calico as cni and not canal.

Then, when I read the policy deployed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: rke2-flannel-host-networking
  namespace: kube-system
spec:
  podSelector: {}
  ingress:
    - {}
  policyTypes:
    - Ingress

I read "Allows all pods in namespace kube-system to receive traffic from all namespaces, pods and IP addresses on all ports".

And according to the ref you send me it's written: "The NetworkPolicy used will only allow pods within the same namespace to talk to each other. The notable exception to this is that it allows DNS requests to be resolved".

Maybe something like:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-same-namespace
  namespace: kube-system
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
  policyTypes:
    - Ingress

Wich could be read as "Allows all pods in namespace kube-system to receive traffic from all pods in the same namespace on all ports (denies inbound traffic to all pods in namespace kube-system from other namespaces)"

I use this tool to help me to be sure:
rke2-flannel-host-networking
allow-from-same-namespace

@albundy83
Copy link
Author

Also, there is a default policy that is created that respect what is written:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-network-policy
  namespace: kube-system
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
  policyTypes:
    - Ingress

Maybe the flannel one could be removed ?

@brandond
Copy link
Member

Hmm, that does look suspect. @Oats87 and @manuelbuil could you take a look at this? I suspect that this controller shouldn't run when canal is not the active CNI.

@albundy83
Copy link
Author

Hello,

will you also backport this to others release ?
As you can imagine, the fact that this policy is enabled is a security issue.
Unfortunately, I did not find a way to disable it.

@brandond
Copy link
Member

brandond commented Feb 1, 2024

Yes, we backport everything to all active branches. The policy will not be changed or removed on existing clusters, as its removal could cause unexpected outages. New clusters will not get this policy.

@albundy83
Copy link
Author

albundy83 commented Feb 1, 2024

Thanks for your answer, sorry to insist but at least, when I have updated my release and if I delete manually this policy, nothing will recreate it ?

@brandond
Copy link
Member

brandond commented Feb 1, 2024

correct. after upgrading to a fixed release.

@albundy83
Copy link
Author

Ah perfect, thanks again !!

@albundy83
Copy link
Author

albundy83 commented Feb 19, 2024

Hello,

I have just tried the new release (v1.26.14-rc1+rke2r1 and v1.27.11-rc1+rke2r1) without the wide open rke2-flannel-host-networking Netwok Policy.
Now, that you realy secure kube-system namespace, you should add the following one or you won't be able to interact with rke2-ingress-nginx-controller:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: rke2-ingress-nginx-controller
  namespace: kube-system
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/name: rke2-ingress-nginx
  ingress:
    - ports:
        - protocol: TCP
          port: webhook
  policyTypes:
    - Ingress

@brandond
Copy link
Member

brandond commented Feb 21, 2024

@albundy83 I'm confused why you would need that. The policy you suggested appears to be granting access to the nginx validating webhook port? There shouldn't be anything other than the apiserver hitting that. Can you provide more information on what you're seeing getting blocked without that policy?

@brandond
Copy link
Member

Specifically, this policy should already grant access to the ingress itself on 80/443. The webhook should not be accessed directly by clients; it is only queried by the apiserver.

rke2/pkg/rke2/np.go

Lines 102 to 137 in 3b1d700

// networkIngressPolicy allows for all http and https traffic
// into the kube-system namespace to the ingress controller pods.
var networkIngressPolicy = v1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{
Name: defaultNetworkIngressPolicyName,
},
Spec: v1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"app.kubernetes.io/name": "rke2-ingress-nginx",
},
},
PolicyTypes: []v1.PolicyType{
v1.PolicyTypeIngress,
},
Ingress: []v1.NetworkPolicyIngressRule{
{
Ports: []v1.NetworkPolicyPort{
{
Protocol: &tcp,
Port: &intstr.IntOrString{
IntVal: int32(80),
},
},
{
Protocol: &tcp,
Port: &intstr.IntOrString{
IntVal: int32(443),
},
},
},
},
},
Egress: []v1.NetworkPolicyEgressRule{},
},
}

@albundy83
Copy link
Author

Well, not sure what you mean by apiserver, but each time I try to create an ingrees, I have the following error:

Here for example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hubble-ui
  namespace: kube-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - hubble.my-loving-rke2-cluster.fr
      secretName: hubble.my-loving-rke2-cluster.fr
  rules:
    - host: hubble.my-loving-rke2-cluster.fr
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hubble-ui
                port:
                  name: http
albundy83@localhost:~/outscale/osc-rke2.greg> kubectl -n kube-system apply -f ingress.yaml
Error from server (InternalError): error when creating "ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": context deadline exceeded

I have also this error when I deploy HelmChart object that contains ingress stuff, helm-install-xxx Job never succeed.
That's why I was thinking about the need of this policy.

@brandond
Copy link
Member

brandond commented Feb 21, 2024

failed calling webhook "validate.nginx.ingress.kubernetes.io"

ok, so yeah the problem is that the apiserver is being blocked from accessing the webhook. That makes sense.

@albundy83
Copy link
Author

Is there some improvement we can do for the Policy ?

@ShylajaDevadiga
Copy link
Contributor

Closing issue after validation. Network Policy rke2-flannel-host-networking does not exists when cni other than canal is used and network policy for ingress controller to access webhook is fixed in rc2

Validated using rke2 version v1.29.2-rc2+rke2r1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:

cat /etc/os-release 
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"

Cluster Configuration:
Multi node, 3 server 1 agent

Config.yaml:

write-kubeconfig-mode: "0644"
tls-san:
  - fake.fqdn.value
node-name: ip-172-31-9-53.us-east-2.compute.internal
profile: cis-1.23
cni: calico

Steps to reproduce

  1. Copy config.yaml
  2. Install rke2
  3. Check for existence of policy on a new cluster
  4. Confirm ingress works as expected

Validation results:
Network Policy rke2-flannel-host-networking does not exist when cni: calico is used

ec2-user@ip-172-31-9-53:~> kubectl get netpol -A |grep flannel
ec2-user@ip-172-31-9-53:~> 
ec2-user@ip-172-31-9-53:~> kubectl apply -f ing
namespace/test-ingress created
networkpolicy.networking.k8s.io/ingress-to-backends created
ingress.networking.k8s.io/test-ingress created
service/nginx-ingress-svc created
replicationcontroller/test-ingress created
networkpolicy.networking.k8s.io/allow-all-ingress created

Network Policy rke2-flannel-host-networking exists when cni: canal (default cni) is used

ec2-user@ip-172-31-12-174:~> cat /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: "0644"
tls-san:
  - fake.fqdn.value
node-name: ip-172-31-12-174.us-east-2.compute.internal
profile: cis-1.23

ec2-user@ip-172-31-12-174:~> kubectl get networkpolicy -A |grep flannel
default       rke2-flannel-host-networking             <none>                                      171m
kube-public   rke2-flannel-host-networking             <none>                                      171m
kube-system   rke2-flannel-host-networking             <none>                                      171m
ec2-user@ip-172-31-12-174:~> 

Network Policy rke2-flannel-host-networking does not exist when cni: cilium is used

ec2-user@ip-172-31-8-175:~> cat /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: "0644"
tls-san:
  - fake.fqdn.value
node-name: ip-172-31-8-175.us-east-2.compute.internal
profile: cis-1.23
cni: cilium

ec2-user@ip-172-31-8-175:~> kubectl get networkpolicy -A|grep flannel
ec2-user@ip-172-31-8-175:~>

ec2-user@ip-172-31-8-175:~> kubectl apply -f ing
namespace/test-ingress created
networkpolicy.networking.k8s.io/ingress-to-backends created
ingress.networking.k8s.io/test-ingress created
service/nginx-ingress-svc created
replicationcontroller/test-ingress created
networkpolicy.networking.k8s.io/allow-all-ingress created
ec2-user@ip-172-31-8-175:~>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants