Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostStartHook in container "metad" and "storaged" failed #544

Open
aleksrosz opened this issue Jan 21, 2025 · 9 comments
Open

PostStartHook in container "metad" and "storaged" failed #544

aleksrosz opened this issue Jan 21, 2025 · 9 comments
Assignees
Labels
affects/none PR/issue: this bug affects none version. severity/none Severity of bug type/feature req Type: feature request

Comments

@aleksrosz
Copy link

aleksrosz commented Jan 21, 2025

Describe the bug (required)

  • exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.xxx.svc.cluster.local:9559 --local_ip=nebula-metad-0.nebula-metad-headless.xxx.svc.cluster.local --daemonize=false
    returns exit code 1. There is no more logs.

What I changed from default config is additional securityContext required by my environment. I can't run app as root.
runAsNonRoot: true
allowPrivilegeEscalation: false

Helm chart doesn't properly install nebula cluster.

Your Environments (required)

  • OS:
    Server Version: 4.16.4
    Kubernetes Version: v1.29.6+aba1e8d
  • Commit id:
    237685f

How To Reproduce(required)

Steps to reproduce the behavior:

  1. helm install nebula-cluster ./ --values ./values.yaml -n
  2. PostStartHook script in "metad" and "storaged" containers return exit code 1
  3. pods are in state CrashLoopBackOff because of "PostStartHook failed"

Expected behavior

PostStartHook in container "metad" and "storaged" should return exit code 0. Cu

Additional context

nebula:
  version: v3.6.0
  imagePullPolicy: Always
  storageClassName: "default"
  enablePVReclaim: false
  enableBR: false
  enableForceUpdate: false
  schedulerName: default-scheduler # nebula-scheduler
  topologySpreadConstraints:
  - topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: "ScheduleAnyway"
  logRotate: {}
  reference:
    name: statefulsets.apps
    version: v1
  graphd:
    image: vesoft/nebula-graphd
    replicas: 2
    serviceType: NodePort
    env: []
    config: {}
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "500Mi"
    logVolume:
      enable: true
      storage: "500Mi"
    podLabels: {}
    podAnnotations: {}
    securityContext:
      runAsNonRoot: true
      allowPrivilegeEscalation: false
    nodeSelector: {}
    tolerations: []
    affinity: {}
    readinessProbe: {}
    livenessProbe: {}
    initContainers: []
    sidecarContainers: []
    volumes: []
    volumeMounts: []

  metad:
    image: vesoft/nebula-metad
    replicas: 3
    env: []
    config: {}
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    logVolume:
      enable: true
      storage: "500Mi"
    dataVolume:
      storage: "2Gi"
    licenseManagerURL: ""
    license: {}
    podLabels: {}
    podAnnotations: {}
    securityContext:
      runAsNonRoot: true
      allowPrivilegeEscalation: false
    nodeSelector: {}
    tolerations: []
    affinity: {}
    readinessProbe: {}
    livenessProbe: {}
    initContainers: []
    sidecarContainers: []
    volumes: []
    volumeMounts: []

  storaged:
    image: vesoft/nebula-storaged
    replicas: 3
    env: []
    config: {}
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    logVolume:
      enable: true
      storage: "500Mi"
    dataVolumes:
    - storage: "10Gi"
    enableAutoBalance: false
    podLabels: {}
    podAnnotations: {}
    securityContext:
      runAsNonRoot: true
      allowPrivilegeEscalation: false
    nodeSelector: {}
    tolerations: []
    affinity: {}
    readinessProbe: {}
    livenessProbe: {}
    initContainers: []
    sidecarContainers: []
    volumes: []
    volumeMounts: []

  exporter:
    image: vesoft/nebula-stats-exporter
    version: v3.3.0
    replicas: 1
    env: []
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "200m"
        memory: "256Mi"
    podLabels: {}
    podAnnotations: {}
    securityContext:
      runAsNonRoot: true
      allowPrivilegeEscalation: false
    nodeSelector: {}
    tolerations: []
    affinity: {}
    readinessProbe: {}
    livenessProbe: {}
    initContainers: []
    sidecarContainers: []
    volumes: []
    volumeMounts: []
    maxRequests: 20

  agent:
    image: vesoft/nebula-agent
    version: latest
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "200m"
        memory: "256Mi"
    env: []
    volumeMounts: []

  console:
    username: root
    password: nebula
    image: vesoft/nebula-console
    version: "v3.8.0"
    nodeSelector: {}

  alpineImage:

  sslCerts: {}

  coredumpPreservation:
    maxTimeKept: 72h
    enable: false
    volumeSize: 5Gi
    
# Note: for all 3 components, specifying positive integers for both minAvailable and maxUnavailable will result in minAvailable being used.
# Please specify any negitive interger for minAvailable (i.e. -1) and a positive integer for maxUnavailable to use maxUnavailable instead.
pdb:
  graphd:
    enable: false
    minAvailable: 2
    maxUnavailable: 1
  metad:
    enable: false
    minAvailable: 3
    maxUnavailable: 1
  storaged:
    enable: false
    minAvailable: 3
    maxUnavailable: 1

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
@kevinliu24 kevinliu24 added the type/bug Type: something is unexpected label Jan 27, 2025
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jan 27, 2025
@kevinliu24
Copy link
Contributor

Thanks for the report. Will try to reproduce this in house today.

@kevinliu24
Copy link
Contributor

kevinliu24 commented Jan 28, 2025

@aleksrosz cc. @wenhaocs I was unable to reproduce the issue in house with the nebula cluster configs given above. Could you please send your metad, graphd, storaged and operator logs following the instructions below so I can look into this further? Thanks.

Metad, Graphd, Storaged logs:

  1. Please change the config: {} section under the metad, graphd and storaged sections to the following
config:
  redirect_stdout: "false"
  1. Reinstall the nebula-cluster helm chart and run kubectl logs <pod-name> -n <nebula-cluster-namespace> on one of each of the metad, graphd and storaged pods.

Operator log:

  1. Please run kubectl logs <nebula-operator-pod-name> -n <nebula-operator-namespace>
    Notes:
  2. <nebula-operator-pod-name> can be obtained by running kubectl get pods -n <namespace-used-when-installing-the-nebula-operator-helm-chart> and looking for a pod name that begins with nebula-operator-controller-manager-deployment. There could be more than one so you'll need to repeat the above command for each pod.
  3. <nebula-operator-namespace> is the namespace used with the -n option during helm install of nebula operator

@aleksrosz
Copy link
Author

aleksrosz commented Jan 28, 2025

oc logs nebula-cluster-metad-0
`Defaulted container "metad" out of: metad, dynamic-flags (init)
++ hostname

  • exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --meta_server_addrs=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-1.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-2.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559 --local_ip=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local --daemonize=false
    E20250128 08:36:27.693028 1 MetaDaemon.cpp:120] Open or create pids/nebula-metad.pid': Permission denied

` oc logs nebula-cluster-storaged-0
Defaulted container "storaged" out of: storaged, dynamic-flags (init)
++ hostname

  • exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-1.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-2.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559 --local_ip=nebula-cluster-storaged-0.nebula-cluster-storaged-headless.xxx.svc.cluster.local --daemonize=false
    E20250128 08:36:27.340169 1 StorageDaemon.cpp:110] Open or create pids/nebula-storaged.pid': Permission denied

I don't see any object for graphd. When I do:
oc describe nebulacluster nebula-cluster

In status I see empty field for Graphd:
Last Update Time: 2025-01-28T08:35:19Z
Message: Metad is not healthy
Reason: MetadUnhealthy
Status: False
Type: Ready
Graphd:
Metad:
Phase: Running
Version: v3.6.0
Volume:
Provisioned Done: true

@aleksrosz
Copy link
Author

@kevinliu24
Copy link
Contributor

@aleksrosz Got it, will look into this. Not seeing graphd is normal since it will not be created if metad is not healthy.

@kevinliu24
Copy link
Contributor

@aleksrosz I looked into the link you sent, but I don't think It's due to a readonly filesystem since we don't add the "readOnlyRootFileSystem: true" flag when starting the pods for these services. May I have the output of kubectl get pod <pod-name> -n <nebula-cluster-namespace> -o yaml of one of each of the crashing metad and storaged pods so I can see the configs? Also, is there any security policies in your kubernetes environment that restrict pod filesystems to only be writable by non-root users?

@aleksrosz
Copy link
Author

Sorry that it took me so long.
oc get pod nebula-cluster-metad-0 -n xxx -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.5.166/23"],"mac_address":"0a:58:0a:80:05:a6","gateway_ips":["10.128.4.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.4.1"},{"dest"                                                                                                               :"172.30.0.0/16","nextHop":"10.128.4.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.4.1"}],"ip_address":"10.128.5.166/23","gateway_ip":"10.128.4.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.128.5.166"
          ],
          "mac": "0a:58:0a:80:05:a6",
          "default": true,
          "dns": {}
      }]
    nebula-graph.io/cm-hash: 74a9d0b3e790c68f
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2025-02-03T11:19:31Z"
  generateName: nebula-cluster-metad-
  labels:
    app.kubernetes.io/cluster: nebula-cluster
    app.kubernetes.io/component: metad
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
    apps.kubernetes.io/pod-index: "0"
    controller-revision-hash: nebula-cluster-metad-5c96666746
    statefulset.kubernetes.io/pod-name: nebula-cluster-metad-0
  name: nebula-cluster-metad-0
  namespace: xxx
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-cluster-metad
    uid: b3c3d699-1ae0-416e-b118-8bed4bef1bb5
  resourceVersion: "39640258"
  uid: 0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8
spec:
  containers:
  - command:
    - /bin/sh
    - -ecx
    - exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf
      --meta_server_addrs=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-1.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula                                                                                                               -cluster-metad-2.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559
      --local_ip=$(hostname).nebula-cluster-metad-headless.xxx.svc.cluster.local
      --daemonize=false
    env:
    - name: MY_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: HTTP_PORT
      value: "19559"
    - name: SCRIPT
      value: |2

        set -x

        if [ ! -s "metadata/flags.json" ]; then
         echo "flags.json is empty"
         exit 0
        fi
        while :
        do
          curl -i -X PUT -H "Content-Type: application/json" -d @/metadata/flags.json -s "http://${MY_IP}:${HTTP_PORT}/flags"
          if [ $? -eq 0 ]
          then
            break
          fi
          sleep 1
        done
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - echo "$SCRIPT" > /tmp/post-start-script && sh /tmp/post-start-script
    name: metad
    ports:
    - containerPort: 9559
      name: thrift
      protocol: TCP
    - containerPort: 19559
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /status
        port: 19559
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 500Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/local/nebula/data
      name: metad-data
      subPath: data
    - mountPath: /usr/local/nebula/logs
      name: metad-log
      subPath: logs
    - mountPath: /usr/local/nebula/etc/nebula-metad.conf
      name: nebula-cluster-metad
      subPath: nebula-metad.conf
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nebula-cluster-metad-0
  initContainers:
  - args:
    - echo "$SCRIPT" > /tmp/dynamic-flags-script && sh /tmp/dynamic-flags-script
    command:
    - /bin/sh
    - -c
    env:
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: PARENT_NAME
      value: nebula-cluster-metad
    - name: APISERVER
      value: https://kubernetes.default.svc
    - name: SERVICEACCOUNT
      value: /var/run/secrets/kubernetes.io/serviceaccount
    - name: SCRIPT
      value: "\nset -exo pipefail\n\nTOKEN=$(cat ${SERVICEACCOUNT}/token)\nCACERT=${SERVICEACCOUNT}/ca.crt\n
        \           \ncurl -s --cacert ${CACERT} --header \"Authorization: Bearer
        ${TOKEN}\" -X GET ${APISERVER}/apis/apps/v1/namespaces/${NAMESPACE}/statefulsets/${PARENT_NAME}
        | jq .metadata.annotations > /metadata/annotations.json\njq '.\"nebula-graph.io/last-applied-dynamic-flags\"
        | fromjson' /metadata/annotations.json > /metadata/flags.json\ncat /metadata/flags.json\n"
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imagePullPolicy: Always
    name: dynamic-flags
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  nodeName: gn21k8swn09.k8s.team.dt
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000740000
    seLinuxOptions:
      level: s0:c27,c19
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: nebula-sa
  serviceAccountName: nebula-sa
  subdomain: nebula-cluster-metad-headless
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/cluster: nebula-cluster
        app.kubernetes.io/component: metad
        app.kubernetes.io/managed-by: nebula-operator
        app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: metad-data
    persistentVolumeClaim:
      claimName: metad-data-nebula-cluster-metad-0
  - name: metad-log
    persistentVolumeClaim:
      claimName: metad-log-nebula-cluster-metad-0
  - configMap:
      defaultMode: 420
      items:
      - key: nebula-metad.conf
        path: nebula-metad.conf
      name: nebula-cluster-metad
    name: nebula-cluster-metad
  - emptyDir:
      medium: Memory
    name: flags
  - name: kube-api-access-5zdxr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://777a9d0c33429c312009d7e17fe0be3749c6576afb61a735138524616706ff0a
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imageID: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad@sha256:0e44f27d6c3fd8b9a2251dc675332de51e815ec898ec40bded861726c7e22ec9
    lastState:
      terminated:
        containerID: cri-o://777a9d0c33429c312009d7e17fe0be3749c6576afb61a735138524616706ff0a
        exitCode: 1
        finishedAt: "2025-02-03T11:19:44Z"
        reason: Error
        startedAt: "2025-02-03T11:19:44Z"
    name: metad
    ready: false
    restartCount: 1
    started: false
    state:
      waiting:
        message: back-off 10s restarting failed container=metad pod=nebula-cluster-metad-0_xxx(0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8)
        reason: CrashLoopBackOff
  hostIP: 192.168.62.9
  hostIPs:
  - ip: 192.168.62.9
  initContainerStatuses:
  - containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imageID: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine@sha256:30818e4ebf404e9eda9138ba4a10df99297721fdf3f42372d8959d7dec869a72
    lastState: {}
    name: dynamic-flags
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
        exitCode: 0
        finishedAt: "2025-02-03T11:19:42Z"
        reason: Completed
        startedAt: "2025-02-03T11:19:42Z"
  phase: Running
  podIP: 10.128.5.166
  podIPs:
  - ip: 10.128.5.166
  qosClass: Burstable
  startTime: "2025-02-03T11:19:31Z"
[a200381847@gn21extlab07 tool-image]$ oc get pod nebula-cluster-metad-0 -n xxx -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.5.166/23"],"mac_address":"0a:58:0a:80:05:a6","gateway_ips":["10.128.4.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.4.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.4.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.4.1"}],"ip_address":"10.128.5.166/23","gateway_ip":"10.128.4.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.128.5.166"
          ],
          "mac": "0a:58:0a:80:05:a6",
          "default": true,
          "dns": {}
      }]
    nebula-graph.io/cm-hash: 74a9d0b3e790c68f
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2025-02-03T11:19:31Z"
  generateName: nebula-cluster-metad-
  labels:
    app.kubernetes.io/cluster: nebula-cluster
    app.kubernetes.io/component: metad
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
    apps.kubernetes.io/pod-index: "0"
    controller-revision-hash: nebula-cluster-metad-5c96666746
    statefulset.kubernetes.io/pod-name: nebula-cluster-metad-0
  name: nebula-cluster-metad-0
  namespace: xxx
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-cluster-metad
    uid: b3c3d699-1ae0-416e-b118-8bed4bef1bb5
  resourceVersion: "39640487"
  uid: 0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8
spec:
  containers:
  - command:
    - /bin/sh
    - -ecx
    - exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf
      --meta_server_addrs=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-1.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-2.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559
      --local_ip=$(hostname).nebula-cluster-metad-headless.xxx.svc.cluster.local
      --daemonize=false
    env:
    - name: MY_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: HTTP_PORT
      value: "19559"
    - name: SCRIPT
      value: |2

        set -x

        if [ ! -s "metadata/flags.json" ]; then
         echo "flags.json is empty"
         exit 0
        fi
        while :
        do
          curl -i -X PUT -H "Content-Type: application/json" -d @/metadata/flags.json -s "http://${MY_IP}:${HTTP_PORT}/flags"
          if [ $? -eq 0 ]
          then
            break
          fi
          sleep 1
        done
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - echo "$SCRIPT" > /tmp/post-start-script && sh /tmp/post-start-script
    name: metad
    ports:
    - containerPort: 9559
      name: thrift
      protocol: TCP
    - containerPort: 19559
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /status
        port: 19559
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 500Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/local/nebula/data
      name: metad-data
      subPath: data
    - mountPath: /usr/local/nebula/logs
      name: metad-log
      subPath: logs
    - mountPath: /usr/local/nebula/etc/nebula-metad.conf
      name: nebula-cluster-metad
      subPath: nebula-metad.conf
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nebula-cluster-metad-0
  initContainers:
  - args:
    - echo "$SCRIPT" > /tmp/dynamic-flags-script && sh /tmp/dynamic-flags-script
    command:
    - /bin/sh
    - -c
    env:
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: PARENT_NAME
      value: nebula-cluster-metad
    - name: APISERVER
      value: https://kubernetes.default.svc
    - name: SERVICEACCOUNT
      value: /var/run/secrets/kubernetes.io/serviceaccount
    - name: SCRIPT
      value: "\nset -exo pipefail\n\nTOKEN=$(cat ${SERVICEACCOUNT}/token)\nCACERT=${SERVICEACCOUNT}/ca.crt\n
        \           \ncurl -s --cacert ${CACERT} --header \"Authorization: Bearer
        ${TOKEN}\" -X GET ${APISERVER}/apis/apps/v1/namespaces/${NAMESPACE}/statefulsets/${PARENT_NAME}
        | jq .metadata.annotations > /metadata/annotations.json\njq '.\"nebula-graph.io/last-applied-dynamic-flags\"
        | fromjson' /metadata/annotations.json > /metadata/flags.json\ncat /metadata/flags.json\n"
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imagePullPolicy: Always
    name: dynamic-flags
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  nodeName: gn21k8swn09.k8s.team.dt
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000740000
    seLinuxOptions:
      level: s0:c27,c19
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: nebula-sa
  serviceAccountName: nebula-sa
  subdomain: nebula-cluster-metad-headless
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/cluster: nebula-cluster
        app.kubernetes.io/component: metad
        app.kubernetes.io/managed-by: nebula-operator
        app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: metad-data
    persistentVolumeClaim:
      claimName: metad-data-nebula-cluster-metad-0
  - name: metad-log
    persistentVolumeClaim:
      claimName: metad-log-nebula-cluster-metad-0
  - configMap:
      defaultMode: 420
      items:
      - key: nebula-metad.conf
        path: nebula-metad.conf
      name: nebula-cluster-metad
    name: nebula-cluster-metad
  - emptyDir:
      medium: Memory
    name: flags
  - name: kube-api-access-5zdxr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://876d0fe2d20874a8cae12623011d2df4d948ddf88b8695c136ab3867a0f8f4e0
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imageID: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad@sha256:0e44f27d6c3fd8b9a2251dc675332de51e815ec898ec40bded861726c7e22ec9
    lastState:
      terminated:
        containerID: cri-o://876d0fe2d20874a8cae12623011d2df4d948ddf88b8695c136ab3867a0f8f4e0
        exitCode: 1
        finishedAt: "2025-02-03T11:20:05Z"
        reason: Error
        startedAt: "2025-02-03T11:20:05Z"
    name: metad
    ready: false
    restartCount: 2
    started: false
    state:
      waiting:
        message: |-
          Exec lifecycle hook ([/bin/sh -c echo "$SCRIPT" > /tmp/post-start-script && sh /tmp/post-start-script]) for Container "metad" in Pod "nebula-cluster-metad-0_xxx(0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8)" failed - error: rpc error: code = Unknown desc = command error: time="2025-02-03T11:20:05Z" level=error msg="exec failed: unable to start container process: error writing config to pipe: write init-p: broken pipe"
          , stdout: , stderr: , exit code -1, message: ""
        reason: PostStartHookError
  hostIP: 192.168.62.9
  hostIPs:
  - ip: 192.168.62.9
  initContainerStatuses:
  - containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imageID: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine@sha256:30818e4ebf404e9eda9138ba4a10df99297721fdf3f42372d8959d7dec869a72
    lastState: {}
    name: dynamic-flags
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
        exitCode: 0
        finishedAt: "2025-02-03T11:19:42Z"
        reason: Completed
        startedAt: "2025-02-03T11:19:42Z"
  phase: Running
  podIP: 10.128.5.166
  podIPs:
  - ip: 10.128.5.166
  qosClass: Burstable
  startTime: "2025-02-03T11:19:31Z"
[a200381847@gn21extlab07 tool-image]$ oc get pod nebula-cluster-metad-0 -n xxx -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.5.166/23"],"mac_address":"0a:58:0a:80:05:a6","gateway_ips":["10.128.4.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.4.1"},{"dest"                                                                                                               :"172.30.0.0/16","nextHop":"10.128.4.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.4.1"}],"ip_address":"10.128.5.166/23","gateway_ip":"10.128.4.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.128.5.166"
          ],
          "mac": "0a:58:0a:80:05:a6",
          "default": true,
          "dns": {}
      }]
    nebula-graph.io/cm-hash: 74a9d0b3e790c68f
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2025-02-03T11:19:31Z"
  generateName: nebula-cluster-metad-
  labels:
    app.kubernetes.io/cluster: nebula-cluster
    app.kubernetes.io/component: metad
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
    apps.kubernetes.io/pod-index: "0"
    controller-revision-hash: nebula-cluster-metad-5c96666746
    statefulset.kubernetes.io/pod-name: nebula-cluster-metad-0
  name: nebula-cluster-metad-0
  namespace: xxx
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-cluster-metad
    uid: b3c3d699-1ae0-416e-b118-8bed4bef1bb5
  resourceVersion: "39640565"
  uid: 0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8
spec:
  containers:
  - command:
    - /bin/sh
    - -ecx
    - exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf
      --meta_server_addrs=nebula-cluster-metad-0.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula-cluster-metad-1.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559,nebula                                                                                                               -cluster-metad-2.nebula-cluster-metad-headless.xxx.svc.cluster.local:9559
      --local_ip=$(hostname).nebula-cluster-metad-headless.xxx.svc.cluster.local
      --daemonize=false
    env:
    - name: MY_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: HTTP_PORT
      value: "19559"
    - name: SCRIPT
      value: |2

        set -x

        if [ ! -s "metadata/flags.json" ]; then
         echo "flags.json is empty"
         exit 0
        fi
        while :
        do
          curl -i -X PUT -H "Content-Type: application/json" -d @/metadata/flags.json -s "http://${MY_IP}:${HTTP_PORT}/flags"
          if [ $? -eq 0 ]
          then
            break
          fi
          sleep 1
        done
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - echo "$SCRIPT" > /tmp/post-start-script && sh /tmp/post-start-script
    name: metad
    ports:
    - containerPort: 9559
      name: thrift
      protocol: TCP
    - containerPort: 19559
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /status
        port: 19559
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 500Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/local/nebula/data
      name: metad-data
      subPath: data
    - mountPath: /usr/local/nebula/logs
      name: metad-log
      subPath: logs
    - mountPath: /usr/local/nebula/etc/nebula-metad.conf
      name: nebula-cluster-metad
      subPath: nebula-metad.conf
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nebula-cluster-metad-0
  initContainers:
  - args:
    - echo "$SCRIPT" > /tmp/dynamic-flags-script && sh /tmp/dynamic-flags-script
    command:
    - /bin/sh
    - -c
    env:
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: PARENT_NAME
      value: nebula-cluster-metad
    - name: APISERVER
      value: https://kubernetes.default.svc
    - name: SERVICEACCOUNT
      value: /var/run/secrets/kubernetes.io/serviceaccount
    - name: SCRIPT
      value: "\nset -exo pipefail\n\nTOKEN=$(cat ${SERVICEACCOUNT}/token)\nCACERT=${SERVICEACCOUNT}/ca.crt\n
        \           \ncurl -s --cacert ${CACERT} --header \"Authorization: Bearer
        ${TOKEN}\" -X GET ${APISERVER}/apis/apps/v1/namespaces/${NAMESPACE}/statefulsets/${PARENT_NAME}
        | jq .metadata.annotations > /metadata/annotations.json\njq '.\"nebula-graph.io/last-applied-dynamic-flags\"
        | fromjson' /metadata/annotations.json > /metadata/flags.json\ncat /metadata/flags.json\n"
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imagePullPolicy: Always
    name: dynamic-flags
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000740000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /metadata
      name: flags
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5zdxr
      readOnly: true
  nodeName: gn21k8swn09.k8s.team.dt
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000740000
    seLinuxOptions:
      level: s0:c27,c19
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: nebula-sa
  serviceAccountName: nebula-sa
  subdomain: nebula-cluster-metad-headless
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/cluster: nebula-cluster
        app.kubernetes.io/component: metad
        app.kubernetes.io/managed-by: nebula-operator
        app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: metad-data
    persistentVolumeClaim:
      claimName: metad-data-nebula-cluster-metad-0
  - name: metad-log
    persistentVolumeClaim:
      claimName: metad-log-nebula-cluster-metad-0
  - configMap:
      defaultMode: 420
      items:
      - key: nebula-metad.conf
        path: nebula-metad.conf
      name: nebula-cluster-metad
    name: nebula-cluster-metad
  - emptyDir:
      medium: Memory
    name: flags
  - name: kube-api-access-5zdxr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:43Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    message: 'containers with unready status: [metad]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-02-03T11:19:31Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://876d0fe2d20874a8cae12623011d2df4d948ddf88b8695c136ab3867a0f8f4e0
    image: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad:v3.6.0
    imageID: dockerhub.xxx.xxx.xxx/vesoft/nebula-metad@sha256:0e44f27d6c3fd8b9a2251dc675332de51e815ec898ec40bded861726c7e22ec9
    lastState:
      terminated:
        containerID: cri-o://876d0fe2d20874a8cae12623011d2df4d948ddf88b8695c136ab3867a0f8f4e0
        exitCode: 1
        finishedAt: "2025-02-03T11:20:05Z"
        reason: Error
        startedAt: "2025-02-03T11:20:05Z"
    name: metad
    ready: false
    restartCount: 2
    started: false
    state:
      waiting:
        message: back-off 20s restarting failed container=metad pod=nebula-cluster-metad-0_xxx(0a31b1c8-0e0b-4e8f-a333-742d2fbc8db8)
        reason: CrashLoopBackOff
  hostIP: 192.168.62.9
  hostIPs:
  - ip: 192.168.62.9
  initContainerStatuses:
  - containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
    image: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine:1
    imageID: repository.xxx.xxx/team-openshift-docker/xxx/vesoft/nebula-alpine@sha256:30818e4ebf404e9eda9138ba4a10df99297721fdf3f42372d8959d7dec869a72
    lastState: {}
    name: dynamic-flags
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://a233513a00cf651986152999822633c096308deeafd0169d674e65f0dababd52
        exitCode: 0
        finishedAt: "2025-02-03T11:19:42Z"
        reason: Completed
        startedAt: "2025-02-03T11:19:42Z"
  phase: Running
  podIP: 10.128.5.166
  podIPs:
  - ip: 10.128.5.166
  qosClass: Burstable
  startTime: "2025-02-03T11:19:31Z"

@kevinliu24
Copy link
Contributor

@aleksrosz Sorry, just saw your message. Will take a look and get back to you.

@MegaByte875
Copy link
Contributor

related issue: vesoft-inc/nebula#6001

@kevinliu24 kevinliu24 added type/feature req Type: feature request and removed type/bug Type: something is unexpected labels Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. severity/none Severity of bug type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants