Skip to content
This repository has been archived by the owner on Dec 3, 2021. It is now read-only.

"container init exited prematurely" also showing up in vqfx snapshot #274

Open
Mierdin opened this issue Oct 4, 2019 · 1 comment
Open

Comments

@Mierdin
Copy link
Member

Mierdin commented Oct 4, 2019

The "container init exited prematurely" error seemed to be intermittently only on the container-vqfx image, but looks like it happens on the snapshot image too.

kubectl describe pods -n=15-5uj8zl2e2b2copns-ns vqfx2                                                                                                                                                [13:52:30]
Name:         vqfx2
Namespace:    15-5uj8zl2e2b2copns-ns
Priority:     0
Node:         antidote-worker-3
Start Time:   Fri, 04 Oct 2019 13:35:34 -0700
Labels:       lessonId=15
              podName=vqfx2
              syringeManaged=yes
Annotations:  k8s.v1.cni.cncf.io/networks: [{"name":"vqfx1-vqfx2-net"},{"name":"vqfx2-vqfx3-net"}]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "ips": [
                        "192.168.241.21"
                    ],
                    "default": true,
                    "dns": {}
                },{
                    "name": "15-5uj8zl2e2b2copns-ns-vqfx1-vqfx2-net",
                    "ips": [
                        "10.10.0.6"
                    ],
                    "dns": {}
                },{
                    "name": "15-5uj8zl2e2b2copns-ns-vqfx2-vqfx3-net",
                    "ips": [
                        "10.10.0.6"
                    ],
                    "dns": {}
                }]
Status:       Running
IP:           192.168.241.21
Init Containers:
  git-clone:
    Container ID:  docker://4a1841c61177c05096168735d9b87108beb3dd47c032b46ccfa7f4c144496832
    Image:         antidotelabs/githelper:v0.4.0
    Image ID:      docker-pullable://docker.io/antidotelabs/githelper@sha256:2edfc05da9e8ceca17bab6c37ced1a064f057c446e238500245d23ab295de1f1
    Port:          <none>
    Host Port:     <none>
    Args:
      https://github.com/nre-learning/nrelabs-curriculum.git
      master
      /antidote
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 04 Oct 2019 13:36:17 -0700
      Finished:     Fri, 04 Oct 2019 13:36:21 -0700
    Ready:          True
    Restart Count:  1
    Environment:    <none>
    Mounts:
      /antidote from git-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mhh9d (ro)
Containers:
  vqfx2:
    Container ID:  docker://d63be1651d74a170a7a8e3e71d8c81335aa6976623712672146269f9bd81754d
    Image:         antidotelabs/vqfx-snap2:v1.0.0
    Image ID:      docker-pullable://docker.io/antidotelabs/vqfx-snap2@sha256:bc96ed79cf00b1dfe5958443af8033493796cf0c66d78b5d559063753d3e8ad5
    Port:          22/TCP
    Host Port:     0/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      ContainerCannotRun
      Message:     oci runtime error: container_linux.go:235: starting container process caused "container init exited prematurely"

      Exit Code:    128
      Started:      Fri, 04 Oct 2019 13:47:22 -0700
      Finished:     Fri, 04 Oct 2019 13:47:22 -0700
    Ready:          False
    Restart Count:  7
    Environment:
      SYRINGE_FULL_REF:  15-5uj8zl2e2b2copns-ns-vqfx2
    Mounts:
      /antidote from git-volume (rw,path="lessons/tools/lesson-15-stackstorm")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mhh9d (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  git-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  default-token-mhh9d:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mhh9d
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason          Age                  From                        Message
  ----     ------          ----                 ----                        -------
  Normal   Scheduled       16m                  default-scheduler           Successfully assigned 15-5uj8zl2e2b2copns-ns/vqfx2 to antidote-worker-3
  Normal   Started         16m                  kubelet, antidote-worker-3  Started container vqfx2
  Normal   SandboxChanged  16m                  kubelet, antidote-worker-3  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         16m                  kubelet, antidote-worker-3  Stopping container vqfx2
  Normal   Pulled          16m (x2 over 16m)    kubelet, antidote-worker-3  Container image "antidotelabs/githelper:v0.4.0" already present on machine
  Normal   Created         16m (x2 over 16m)    kubelet, antidote-worker-3  Created container git-clone
  Normal   Started         16m (x2 over 16m)    kubelet, antidote-worker-3  Started container git-clone
  Normal   Pulling         15m (x4 over 16m)    kubelet, antidote-worker-3  Pulling image "antidotelabs/vqfx-snap2:v1.0.0"
  Normal   Pulled          15m (x4 over 16m)    kubelet, antidote-worker-3  Successfully pulled image "antidotelabs/vqfx-snap2:v1.0.0"
  Normal   Created         15m (x4 over 16m)    kubelet, antidote-worker-3  Created container vqfx2
  Warning  Failed          15m (x3 over 16m)    kubelet, antidote-worker-3  Error: failed to start container "vqfx2": Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "container init exited prematurely"
  Warning  BackOff         113s (x63 over 15m)  kubelet, antidote-worker-3  Back-off restarting failed container

Ideas for fixing:

Maybe useful output from the kubelet (not on this image, the full vqfx image. Might be related tho, or worst case we should get same when testing this image):

kubelet[10873]: E0829 23:13:26.374746   10873 pod_workers.go:190] Error syncing pod 88310250-caaf-11e9-8781-0cc47ae547a8 ("vqfx2_12-l0yh5ozt7urx8ran-ns(88310250-caaf-11e9-8781-0cc47ae547a8)"), skipping: failed to "StartContainer" for "vqfx2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=vqfx2 pod=vqfx2_12-l0yh5ozt7urx8ran-ns(88310250-caaf-11e9-8781-0cc47ae547a8)"
Aug 29 23:13:26 antidote-worker-1 dockerd-current[12144]: DEBU: 2019/08/29 23:13:26.395575 EVENT UpdatePod {"metadata":{"annotations":{"k8s.v1.cni.cncf.io/networks":"[{\"name\":\"vqfx1-vqfx2-net\"},{\"name\":\"vqfx2-vqfx3-net\"}]","k8s.v1.cni.cncf.io/networks-status":"[{\n    \"name\": \"\",\n    \"ips\": [\n        \"192.168.67.69\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n},{\n    \"name\": \"12-l0yh5ozt7urx8ran-ns-vqfx1-vqfx2-net\",\n    \"ips\": [\n        \"10.10.0.4\"\n    ],\n    \"dns\": {}\n},{\n    \"name\": \"12-l0yh5ozt7urx8ran-ns-vqfx2-vqfx3-net\",\n    \"ips\": [\n        \"10.10.0.4\"\n    ],\n    \"dns\": {}\n}]"},"creationTimestamp":"2019-08-29T22:51:26Z","labels":{"lessonId":"12","podName":"vqfx2","syringeManaged":"yes"},"name":"vqfx2","namespace":"12-l0yh5ozt7urx8ran-ns","resourceVersion":"4358037","selfLink":"/api/v1/namespaces/12-l0yh5ozt7urx8ran-ns/pods/vqfx2","uid":"88310250-caaf-11e9-8781-0cc47ae547a8"},"spec":{"affinity":{"podAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchLabels":{"lessonId":"12","syringeManaged":"yes"}},"namespaces":["12-l0yh5ozt7urx8ran-ns"],"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"image":"antidotelabs/vqfx-snap2:v1.0.0","imagePullPolicy":"Always","name":"vqfx2","ports":[{"containerPort":22,"protocol":"TCP"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","initContainers":[{"args":["https://github.com/nre-learning/nrelabs-curriculum.git","master","/antidote"],"image":"antidotelabs/githelper:v0.4.0","imagePullPolicy":"IfNotPresent","name":"git-clone","resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/antidote","name":"git-volume"},{"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount","name":"default-token-2smvx","readOnly":true}]}],"nodeName":"antidote-worker-1","priority":0,"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"default","serviceAccountName":"default","terminationGracePeriodSeconds":30},"status":{"conditions":[{"lastProbeTime":null,"lastTransitionTime":"2019-08-29T22:52:14Z","status":"True","type":"Initialized"},{"lastProbeTime":null,"lastTransitionTime":"2019-08-29T22:52:09Z","message":"containers with unready status: [vqfx2]","reason":"ContainersNotReady","status":"False","type":"Ready"},{"lastProbeTime":null,"lastTransitionTime":"2019-08-29T22:52:09Z","message":"containers with unready status: [vqfx2]","reason":"ContainersNotReady","status":"False","type":"ContainersReady"},{"lastProbeTime":null,"lastTransitionTime":"2019-08-29T22:51:26Z","status":"True","type":"PodScheduled"}],"hostIP":"147.75.88.205","initContainerStatuses":[{"containerID":"docker://21ede9b83b8ae058fb4f49ca26ced17110d980073fd7560ab16f91e7d81ba942","image":"docker.io/antidotelabs/githelper:v0.4.0","imageID":"docker-pullable://docker.io/antidotelabs/githelper@sha256:2edfc05da9e8ceca17bab6c37ced1a064f057c446e238500245d23ab295de1f1","lastState":{},"name":"git-clone","ready":true,"restartCount":1,"state":{"terminated":{"containerID":"docker://21ede9b83b8ae058fb4f49ca26ced17110d980073fd7560ab16f91e7d81ba942","exitCode":0,"finishedAt":"2019-08-29T22:52:14Z","reason":"Completed","startedAt":"2019-08-29T22:52:08Z"}}}],"phase":"Running","podIP":"192.168.67.69","qosClass":"BestEffort","startTime":"2019-08-29T22:51:26Z"}} {"metadata":{"annotations":{"k8s.v1.cni.cncf.io/networks":"[{\"name\":\"vqfx1-vqfx2-net\"},{\"name\":\"vqfx2-vqfx3-net\"}]","k8s.v1.cni.cncf.io/networks-status":"[{\n    \"name\": \"\",\n    \"ips\": [\n        \"192.168.67.69\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n},{\n    \"name\": \"12-l0yh5ozt7urx8ran-ns-vqfx1-vqfx2-net\",\n    \"ips\": [\n        \"10.10.0.4\"\n    ],\n    \"dnsMaybe container-vqfx fix ideas:\": {}\n},{\n    \"name\": \"12-l0yh5ozt7urx8ran-ns-vqfx2-vqfx3-net\",\n    \"ips\": [\n        \"10.10.0.4\"\n    ],\n    \"dns\": {}\n}]"},"creationTimestamp":"2019-08-29T22:51:26Z","labels":{"lessonId":"12","podName":"vqfx2","syringeManaged":"yes"},"name":"vqfx2","namespace":"12-l0yh5ozt7u
@cloudtoad
Copy link
Contributor

these are the combined recommendations for vSRX and vMX:

  1. Disable Transparent Huge Buffers (THB)
  2. Disable Kernel Samepage Merging (KSM)
  3. Disable Page Modification Logging (PML)
  4. Disable APICv
  5. Enable nested virtualization
  6. Enable 1G of Huge Buffers

I believe this particular problem is related to the 1G of Huge Buffers, however, I'm not 100% certain since I configured all these recommendations at once. I am not sure which of these (if any) must also be configured in the container. I can do some research on that and update later, but I think that none of these need to be configured in individual containers. They only need to be configured on the host. These are all kernel configuration options.

Numbers 1-3 are memory management optimization techniques. These are ways of organizing and deduplicating memory to reduce memory footprint or speed up read/writes to memory.

On number 5, while virtualization might be enabled in BIOS, nested virtualization further requires a configuration step in linux. Some distros of linux have this enabled already.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants