Kubernetes不会在GKE上触发自动缩放
出于某种原因,Kubernetes 1.6.2不会在Google容器引擎上触发自动缩放 我有一个Kubernetes不会在GKE上触发自动缩放,kubernetes,google-kubernetes-engine,Kubernetes,Google Kubernetes Engine,出于某种原因,Kubernetes 1.6.2不会在Google容器引擎上触发自动缩放 我有一个someservice定义,包含以下资源和滚动更新: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: someservice labels: layer: backend spec: minReadySeconds: 160 replicas: 1 strategy: rollingUp
someservice
定义,包含以下资源和滚动更新:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: someservice
labels:
layer: backend
spec:
minReadySeconds: 160
replicas: 1
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
name: someservice
layer: backend
spec:
containers:
- name: someservice
image: eu.gcr.io/XXXXXX/someservice:v1
imagePullPolicy: Always
resources:
limits:
cpu: 2
memory: 20Gi
requests:
cpu: 400m
memory: 18Gi
<.....>
我的节点池设置为自动缩放,最小值为2,最大值为5。节点池中的机器(n1-highmem-8
)足够大(52GB)来容纳此服务。但不知何故,什么也没有发生:
$ kubectl get nodes
NAME STATUS AGE VERSION
gke-dev-default-pool-efca0068-4qq1 Ready 2d v1.6.2
gke-dev-default-pool-efca0068-597s Ready 2d v1.6.2
gke-dev-default-pool-efca0068-6srl Ready 2d v1.6.2
gke-dev-default-pool-efca0068-hb1z Ready 2d v1.6.2
$ kubectl describe nodes | grep -A 4 'Allocated resources'
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7060m (88%) 15510m (193%) 39238591744 (71%) 48582818048 (88%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
6330m (79%) 22200m (277%) 48930Mi (93%) 66344Mi (126%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7360m (92%) 13200m (165%) 49046Mi (93%) 44518Mi (85%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7988m (99%) 11538m (144%) 32967256Ki (61%) 21690968Ki (40%)
$ gcloud container node-pools describe default-pool --cluster=dev
autoscaling:
enabled: true
maxNodeCount: 5
minNodeCount: 2
config:
diskSizeGb: 100
imageType: COS
machineType: n1-highmem-8
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/datastore
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/devstorage.read_write
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/sqlservice
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
serviceAccount: default
initialNodeCount: 2
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/XXXXXX/zones/europe-west1-b/instanceGroupManagers/gke-dev-default-pool-efca0068-grp
management:
autoRepair: true
name: default-pool
selfLink: https://container.googleapis.com/v1/projects/XXXXXX/zones/europe-west1-b/clusters/dev/nodePools/default-pool
status: RUNNING
version: 1.6.2
$ kubectl -n dev get pods -l name=someservice
NAME READY STATUS RESTARTS AGE
someservice-2595684989-h8c5d 0/1 Pending 0 42m
someservice-804061866-f2trc 1/1 Running 0 1h
$ kubectl -n dev describe pod someservice-2595684989-h8c5d
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
43m 43m 4 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (4), Insufficient memory (3).
43m 42m 6 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3), Insufficient memory (3).
41m 41m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (3).
40m 36s 136 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (3).
43m 2s 243 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
看来这是Kubernetes 1.6.2的一个bug。根据GKE支持工程师的说法: 从消息“没有与所有节点匹配的节点可用” 在谓词之后,“这似乎是一个已知的问题 工程师们设法找到了根本原因。这是我的一个问题 GKE 1.6(上一版本)中当前使用的群集自动缩放器版本0.5.1 至1.6.2)。此问题已在群集autoscaler中修复 0.5.2,包含在1.6分支的水头中
确保实例组autoscaler已禁用或具有适当的最小/最大实例数设置 根据: 基于CPU(或任何基于度量的)群集/节点组自动缩放器,如 GCE实例组Autoscaler与[Kubernetes]不兼容 群集奥斯卡勒]。它们也不是特别适合使用 和库伯内特斯在一起 …所以它可能应该被禁用 尝试: 然后检查
autoscaler
属性。如果禁用实例组autoscaler,则将不存在该选项
要禁用它,请执行以下操作:
gcloud compute instance-groups managed stop-autoscaling gke-dev-default-pool-efca0068-grp \
--zone europe-west1-b
gcloud compute instance-groups managed describe gke-dev-default-pool-efca0068-grp \
--zone europe-west1-b
gcloud compute instance-groups managed stop-autoscaling gke-dev-default-pool-efca0068-grp \
--zone europe-west1-b