为什么在Kubernetes CronJob中执行生命周期钩子时出现错误137_Kubernetes

为什么在Kubernetes CronJob中执行生命周期钩子时出现错误137

kubernetes

为什么在Kubernetes CronJob中执行生命周期钩子时出现错误137,kubernetes,Kubernetes,我有Kubernetes CronJob的规范 --- kind: CronJob apiVersion: batch/v1beta1 metadata: name: do-registry-cleanup spec: schedule: "* * * * *" successfulJobsHistoryLimit: 2 failedJobsHistoryLimit: 4 jobTemplate: spec: template: spec

我有Kubernetes CronJob的规范

---
kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: do-registry-cleanup

spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 4
  jobTemplate:
    spec:
      template:
        spec:
          automountServiceAccountToken: false
          restartPolicy: OnFailure
          containers:
          - name: podtest2
            image: alpine
            args:
            - wget
            - http://some_real_url/test/pod/2
            imagePullPolicy: Always
            lifecycle:
              postStart:
                exec:
                  command:
                  - "sh"
                  - "-c"
                  - "sleep 2s;"

当我做

kubectl description pod some_pod_name

时，我得到这个输出（截断）

在这个例子中，wget是请求url，我知道sleep命令是被执行的，而不是被破坏的。我的问题是，为什么：

为什么会这样

这有什么副作用

一些额外的信息。如果命令为“cmd1；sleep；cmd2”，则不会执行cmd2。因此，由于某种原因，sleep cmd invoke在容器中出错。

请尝试

命令：[“/bin/sh”、“-c”、“sleep 2s”]

参考官方文档：

: 一旦容器进入运行状态，就会执行postStart钩子（如果有）

当容器成功完成执行或由于某种原因失败时，它将进入终止状态。无论如何，将显示原因和退出代码，以及容器的开始和完成时间。在容器进入终止状态之前，执行预停止钩子（如果有）

: 有两个挂钩暴露在容器中：

启动后

该钩子在创建容器后立即执行。但是，不能保证钩子会在容器入口点之前执行。未向处理程序传递任何参数

预停

由于API请求或管理事件（如liveness probe失败、抢占、资源争用等）而终止容器之前，会立即调用此钩子。如果容器已经处于终止或完成状态，那么对预停止挂钩的调用将失败。它是阻塞的，这意味着它是同步的，因此它必须在发送删除容器的调用之前完成。未向处理程序传递任何参数

实际上，为PreStop编写的内容也适用于PostStart

基本上，Kubelet不会等到所有挂钩完成。它只是在主容器退出后终止所有内容

对于PreStop我们只能增加宽限期，但是对于PostStart我们可以让主contaner等待挂钩完成。以下是一个例子：

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: test1
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 4
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: test1
            image: nginx
            command: ["bash", "-c", "touch file1; while [ ! -f file2 ] ; do ls file*; sleep 1 ; done; ls file*"]
            lifecycle:
              postStart:
                exec:
                  command: ["bash", "-c", "sleep 10; touch file2"]

如果您检查pod的日志，您将看到钩子在主容器终止之前创建了文件。你可以看到这个循环已经运行了12次，而不是10次。这意味着PostStart在主容器开始运行后2秒后启动。这意味着，容器在启动后进入运行状态并有一定的延迟

$ kubectl describe cronjob/test1 | grep Created
  Normal  SuccessfulCreate  110s  cronjob-controller  Created job test1-1566402420
$ kubectl describe job/test1-1566402420 | grep Created
  Normal  SuccessfulCreate  2m28s  job-controller  Created pod: test1-1566402420-d5lfr
$ kubectl logs pod/test1-1566402420-d5lfr -c test1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file2

不，不正确地使用睡眠。如果更改为“睡眠2秒”，结果与中的相同。。这适用于我

命令：[“/bin/sh”、“-c”、“sleep 2s&&sleep 2s”]

这里的重要部分是使用命令的[]形式，通过绝对路径引用sh，包括“-c”，最后获得一个不会失败的命令。可以尝试

sleep 2s&&exit 0

或

sleep 2s | | | exit 0

。

wget

可能在睡眠结束之前完成。

$ kubectl describe cronjob/test1 | grep Created
  Normal  SuccessfulCreate  110s  cronjob-controller  Created job test1-1566402420
$ kubectl describe job/test1-1566402420 | grep Created
  Normal  SuccessfulCreate  2m28s  job-controller  Created pod: test1-1566402420-d5lfr
$ kubectl logs pod/test1-1566402420-d5lfr -c test1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file2