Amazon web services ECS容器每~1小时被杀死一次
更新:原来磁盘空间不足Amazon web services ECS容器每~1小时被杀死一次,amazon-web-services,docker,amazon-ecs,aws-ecs,Amazon Web Services,Docker,Amazon Ecs,Aws Ecs,更新:原来磁盘空间不足 我的ECS容器在启动约1小时后不断被杀死。大约55到65分钟。然后创建一个新容器,大约一个小时后,容器被杀死。我查看了ec2主机的日志以及容器中的日志,但没有显示发生了什么 知道我能做什么吗 # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STA
我的ECS容器在启动约1小时后不断被杀死。大约55到65分钟。然后创建一个新容器,大约一个小时后,容器被杀死。我查看了ec2主机的日志以及容器中的日志,但没有显示发生了什么 知道我能做什么吗
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
90d9xyze57fb xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" 21 minutes ago Up 21 minutes 0.0.0.0:8545->8545/tcp, 0.0.0.0:30303->30303/tcp, 0.0.0.0:30303->30303/udp ecs-geth-task-1-geth-container-f29d85fxyze7c9a5d201
4603xyz723d3 xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" About an hour ago Exited (1) 22 minutes ago ecs-geth-task-1-geth-container-cec7cd8xyze3f88fe901
9f38xyzc032a xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" 2 hours ago Exited (1) About an hour ago ecs-geth-task-1-geth-container-eecfe8cxyz88f8b0ff01
3c33xyza6054 xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" 2 hours ago Exited (1) 2 hours ago ecs-geth-task-1-geth-container-ccc08ddxyzb495d9e001
7a20xyzff29e xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" 3 hours ago Exited (1) 2 hours ago ecs-geth-task-1-geth-container-8c96e1exyz8aff821d00
75bdxyzc00e7 xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest "/usr/bin/rungeth" 4 hours ago Exited (1) 3 hours ago ecs-geth-task-1-geth-container-e0aec48xyzf58bfcf101
1b3bxyz1961f amazon/amazon-ecs-agent:latest "/agent" 4 hours ago Up 4 hours ecs-agent
# docker logs 4603xyz723d3
#
结果是磁盘空间不足 附加较大的卷并将启动配置的用户数据设置为:
#cloud-boothook
cloud-init-per once ecs_config echo 'ECS_CLUSTER=my-cluster' >> /etc/ecs/ecs.config
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=200G"' >> /etc/sysconfig/docker
修复了该问题。崩溃时系统会记录什么?它是在一小时后的K分钟发生的吗?正在运行哪些cron作业?外部人员是否有权限杀死它?实例是否可能内存不足?我们刚刚在单实例集群上遇到了类似的问题。我们将实例大小增加了一倍,并使其稳定下来,我们意识到我们错误地配置了内存分配的硬/软限制,这导致了容器崩溃。结果是磁盘空间用完了。您是如何确定这是磁盘空间的?