Linux 运行内存密集型进程的Docker容器被终止-退出代码1

Linux 运行内存密集型进程的Docker容器被终止-退出代码1,linux,docker,kubernetes,gitlab-ci,amazon-eks,Linux,Docker,Kubernetes,Gitlab Ci,Amazon Eks,我正在使用gitlab CI在AWS K8S(EKS)上运行触发的作业,当运行一个长的内存/进程密集型作业时,运行该作业的容器会被终止,退出代码为1 我查看了worker节点上的/var/log/messages中的日志,下面的日志是我得到错误前一分钟的日志 Feb 7 04:11:15 ip-10-226-44-109 kubelet: I0207 04:11:15.614914 4804 reconciler.go:301] Volume detached for volume "

我正在使用gitlab CI在AWS K8S(EKS)上运行触发的作业,当运行一个长的内存/进程密集型作业时,运行该作业的容器会被终止,退出代码为1

我查看了worker节点上的/var/log/messages中的日志,下面的日志是我得到错误前一分钟的日志

Feb  7 04:11:15 ip-10-226-44-109 kubelet: I0207 04:11:15.614914    4804 reconciler.go:301] Volume detached for volume "repo" (UniqueName: "kubernetes.io/empty-dir/b8f0c962-495f-11ea-b638-0673fa95f662-repo") on node "ip-10-226-44-109.ap-southeast-2.compute.internal" DevicePath ""
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered blocking state
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered disabled state
Feb  7 04:11:16 ip-10-226-44-109 kernel: device veth94e5db0 entered promiscuous mode
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered blocking state
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered forwarding state
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered disabled state
Feb  7 04:11:16 ip-10-226-44-109 kernel: eth0: renamed from vethae08789
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered blocking state
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered forwarding state
Feb  7 04:11:16 ip-10-226-44-109 kubelet: W0207 04:11:16.401374    4804 kubelet_getters.go:284] Path "/var/lib/kubelet/pods/b8f0c962-495f-11ea-b638-0673fa95f662/volumes" does not exist
Feb  7 04:11:16 ip-10-226-44-109 kubelet: W0207 04:11:16.401410    4804 kubelet_getters.go:284] Path "/var/lib/kubelet/pods/b9edf486-495f-11ea-b638-0673fa95f662/volumes" does not exist
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered disabled state
Feb  7 04:11:16 ip-10-226-44-109 kernel: vethae08789: renamed from eth0
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered disabled state
Feb  7 04:11:16 ip-10-226-44-109 kernel: device veth94e5db0 left promiscuous mode
Feb  7 04:11:16 ip-10-226-44-109 kernel: docker0: port 1(veth94e5db0) entered disabled state
Feb  7 04:11:59 ip-10-226-44-109 kubelet: I0207 04:11:59.379996    4804 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-r26dz" (UniqueName: "kubernetes.io/secret/fc5d9f2e-495f-11ea-b638-0673fa95f662-default-token-r26dz") pod "runner-u4zrz1by-project-12123209-concurrent-44rwnh" (UID: "fc5d9f2e-495f-11ea-b638-0673fa95f662")
Feb  7 04:11:59 ip-10-226-44-109 kubelet: I0207 04:11:59.380025    4804 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "repo" (UniqueName: "kubernetes.io/empty-dir/fc5d9f2e-495f-11ea-b638-0673fa95f662-repo") pod "runner-u4zrz1by-project-12123209-concurrent-44rwnh" (UID: "fc5d9f2e-495f-11ea-b638-0673fa95f662")
Feb  7 04:11:59 ip-10-226-44-109 systemd: Started Kubernetes transient mount for /var/lib/kubelet/pods/fc5d9f2e-495f-11ea-b638-0673fa95f662/volumes/kubernetes.io~secret/default-token-r26dz.
Feb  7 04:11:59 ip-10-226-44-109 systemd: Starting Kubernetes transient mount for /var/lib/kubelet/pods/fc5d9f2e-495f-11ea-b638-0673fa95f662/volumes/kubernetes.io~secret/default-token-r26dz.
Feb  7 04:11:59 ip-10-226-44-109 dockerd: time="2020-02-07T04:11:59Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/1921d53d580fc451fd2801dffc196a67e2970acbfbf2604ddf96b925bb7ecb88/shim.sock" debug=false pid=24408
Feb  7 04:12:00 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:00Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/b9afda2153670e1427b562c33bd173039cea4e9adcd5433709fcdc31068fd9ff/shim.sock" debug=false pid=24533
Feb  7 04:12:00 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:00Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/c0f4dcfd74f4eb53ba486e0844bbdea82bc177084fca8faa005ef2a04e3bfaab/shim.sock" debug=false pid=24600
Feb  7 04:12:00 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:00Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d887b7a2f86900d378c709c64c32faec99935753ff6525bf1992f1efc53cf8b4/shim.sock" debug=false pid=24668
Feb  7 04:12:00 ip-10-226-44-109 kubelet: I0207 04:12:00.481851    4804 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "repo" (UniqueName: "kubernetes.io/empty-dir/fcff51c0-495f-11ea-b638-0673fa95f662-repo") pod "runner-u4zrz1by-project-12123209-concurrent-5tcz7v" (UID: "fcff51c0-495f-11ea-b638-0673fa95f662")
Feb  7 04:12:00 ip-10-226-44-109 kubelet: I0207 04:12:00.481889    4804 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-r26dz" (UniqueName: "kubernetes.io/secret/fcff51c0-495f-11ea-b638-0673fa95f662-default-token-r26dz") pod "runner-u4zrz1by-project-12123209-concurrent-5tcz7v" (UID: "fcff51c0-495f-11ea-b638-0673fa95f662")
Feb  7 04:12:00 ip-10-226-44-109 systemd: Started Kubernetes transient mount for /var/lib/kubelet/pods/fcff51c0-495f-11ea-b638-0673fa95f662/volumes/kubernetes.io~secret/default-token-r26dz.
Feb  7 04:12:00 ip-10-226-44-109 systemd: Starting Kubernetes transient mount for /var/lib/kubelet/pods/fcff51c0-495f-11ea-b638-0673fa95f662/volumes/kubernetes.io~secret/default-token-r26dz.
Feb  7 04:12:00 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:00Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/1c6ce43191ef995ceab9d237327071b6b8e7a57165f458c9f11960d51e47a07e/shim.sock" debug=false pid=24798
Feb  7 04:12:00 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:00Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d15361ee161d0c66bf02a8e009f0d8441608b372a3d0b1f305c5b6bb2cc0f12a/shim.sock" debug=false pid=24977
Feb  7 04:12:01 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:01Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/53baa9364337b3d3bbd7fff3606fc985ed20586c293bf9a28e353f48474e8085/shim.sock" debug=false pid=25203
Feb  7 04:12:01 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:01Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/962a39cfda64405c73872a5d5945dea879aa87873d088925edf600b6f1b6b321/shim.sock" debug=false pid=25269
Feb  7 04:12:01 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:01Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/cc4ef0c7f4c512033c55888770a0477d1a1afa02b7d9611adaa0a03b3388a6ba/shim.sock" debug=false pid=25340
Feb  7 04:12:01 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:01Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/4d871dc03e766baa5707c0bd6502c24333aab9263d52443d31e19ac080054d9d/shim.sock" debug=false pid=25463
Feb  7 04:12:05 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:05Z" level=info msg="shim reaped" id=28b82bf41cfa344d616118007a8b9ac2440148b65dd76afb4124504c404e65a8
Feb  7 04:12:05 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:05.663819264Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06Z" level=info msg="shim reaped" id=3dab6902724b6639dfbcb5b546eb9b9e8f3131ac5d63453303955d8412f9e23b
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.475126364Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.590951867Z" level=info msg="Container 245af759c17645d8fbad6fa86c83812aaf109db19a6271037bfdc2ddec1975c2 failed to exit within 2 seconds of signal 15 - using the force"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.590959536Z" level=info msg="Container 7a21f68defc90bacdd8db3bc2660d18b06801f4ea312befef01e8eb95d115d3b failed to exit within 2 seconds of signal 15 - using the force"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.598126562Z" level=info msg="Container 7a21f68defc90bacdd8db3bc2660d18b06801f4ea312befef01e8eb95d115d3b failed to exit within 2 seconds of signal 15 - using the force"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.600480653Z" level=info msg="Container 245af759c17645d8fbad6fa86c83812aaf109db19a6271037bfdc2ddec1975c2 failed to exit within 2 seconds of signal 15 - using the force"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06Z" level=info msg="shim reaped" id=7a21f68defc90bacdd8db3bc2660d18b06801f4ea312befef01e8eb95d115d3b
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.680421611Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06Z" level=info msg="shim reaped" id=245af759c17645d8fbad6fa86c83812aaf109db19a6271037bfdc2ddec1975c2
Feb  7 04:12:06 ip-10-226-44-109 dockerd: time="2020-02-07T04:12:06.692681358Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

我注意到,每当进程终止时,我都可以在日志中看到以下内容:

Feb 10 01:30:42 ip-10-226-44-109 kernel: docker0: port 1(veth81ae14c) entered disabled state
Feb 10 01:30:42 ip-10-226-44-109 kernel: veth59985b4: renamed from eth0
Feb 10 01:30:42 ip-10-226-44-109 kernel: docker0: port 1(veth81ae14c) entered disabled state
Feb 10 01:30:42 ip-10-226-44-109 kernel: device veth81ae14c left promiscuous mode
Feb 10 01:30:42 ip-10-226-44-109 kernel: docker0: port 1(veth81ae14c) entered disabled state
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(vethab14b48) entered disabled state
Feb 10 01:30:43 ip-10-226-44-109 kernel: veth43de279: renamed from eth0
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(vethab14b48) entered disabled state
Feb 10 01:30:43 ip-10-226-44-109 kernel: device vethab14b48 left promiscuous mode
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(vethab14b48) entered disabled state
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered blocking state
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered disabled state
Feb 10 01:30:43 ip-10-226-44-109 kernel: device veth9b33ebc entered promiscuous mode
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered blocking state
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered forwarding state
Feb 10 01:30:43 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered disabled state
Feb 10 01:30:44 ip-10-226-44-109 kernel: eth0: renamed from veth61422d0
Feb 10 01:30:44 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered blocking state
Feb 10 01:30:44 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered forwarding state
Feb 10 01:30:44 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered disabled state
Feb 10 01:30:44 ip-10-226-44-109 kernel: veth61422d0: renamed from eth0
Feb 10 01:30:44 ip-10-226-44-109 kernel: docker0: port 1(veth9b33ebc) entered disabled state


有人能帮我理解这一点的原因和可能的解决方案吗


谢谢。

能否检查您正在使用的名称空间中是否有任何已定义的资源配额
$kubectl descripe namespace
。。您的工作有哪些资源限制!!你好@DT。否,没有为名称空间定义配额。我不想对gitlab运行人员正在使用的名称空间应用任何配额或限制范围,因为他们可能会因为工作而非常需要资源。