为什么部署rook ceph后在kubernetes找不到osd吊舱?

为什么部署rook ceph后在kubernetes找不到osd吊舱?,kubernetes,ceph,rook-storage,kubernetes-rook,Kubernetes,Ceph,Rook Storage,Kubernetes Rook,尝试按照以下指南在kubernetes上安装rook ceph: 当我检查所有的吊舱时 $ kubectl -n rook-ceph get pod NAME READY STATUS RESTARTS AGE csi-cephfsplugin-9c2z9 3/3 Running 0 23m csi-cephfsplu

尝试按照以下指南在kubernetes上安装rook ceph:

当我检查所有的吊舱时

$ kubectl -n rook-ceph get pod
NAME                                            READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-9c2z9                          3/3     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-s67hq   5/5     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-sfljd   5/5     Running   0          23m
csi-cephfsplugin-smmlf                          3/3     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq       6/6     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-rp85z       6/6     Running   0          23m
csi-rbdplugin-s67lw                             3/3     Running   0          23m
csi-rbdplugin-zq4k5                             3/3     Running   0          23m
rook-ceph-mon-a-canary-954dc5cd9-5q8tk          1/1     Running   0          2m9s
rook-ceph-mon-b-canary-b9d6f5594-mcqwc          1/1     Running   0          2m9s
rook-ceph-mon-c-canary-78b48dbfb7-z2t7d         0/1     Pending   0          2m8s
rook-ceph-operator-757d6db48d-x27lm             1/1     Running   0          25m
rook-ceph-tools-75f575489-znbbz                 1/1     Running   0          7m45s
rook-discover-gq489                             1/1     Running   0          24m
rook-discover-p9zlg                             1/1     Running   0          24m
做些别的手术

$ kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
$ kubectl -n rook-ceph-system delete pods rook-ceph-operator-757d6db48d-x27lm
创建文件系统

$ kubectl create -f filesystem.yaml
再次检查

$ kubectl get pods -n rook-ceph -o wide
NAME                                              READY   STATUS     RESTARTS   AGE    IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-9c2z9                            3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-s67hq     5/5     Running    0          135m   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-sfljd     5/5     Running    0          135m   10.1.2.5       kube3    <none>           <none>
csi-cephfsplugin-smmlf                            3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq         6/6     Running    0          135m   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-rp85z         6/6     Running    0          135m   10.1.1.5       kube2    <none>           <none>
csi-rbdplugin-s67lw                               3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-zq4k5                               3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
rook-ceph-crashcollector-kube2-6d95bb9c-r5w7p     0/1     Init:0/2   0          110m   <none>         kube2    <none>           <none>
rook-ceph-crashcollector-kube3-644c849bdb-9hcvg   0/1     Init:0/2   0          110m   <none>         kube3    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-6ccbh            1/1     Running    0          75s    10.1.2.130     kube3    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-k85w5            1/1     Running    0          74s    10.1.1.74      kube2    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-kfzzx           0/1     Pending    0          73s    <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-nlh84               1/1     Running    0          110m   10.1.2.28      kube3    <none>           <none>
rook-ceph-tools-75f575489-znbbz                   1/1     Running    0          119m   10.1.1.14      kube2    <none>           <none>
rook-discover-gq489                               1/1     Running    0          135m   10.1.1.3       kube2    <none>           <none>
rook-discover-p9zlg                               1/1     Running    0          135m   10.1.2.4       kube3    <none>           <none>
在容器内,检查ceph状态

[root@rook-ceph-tools-75f575489-znbbz /]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster
它在Ubuntu 16.04.6上运行

重新部署

$ kubectl -n rook-ceph get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-4tww8                          3/3     Running   0          3m38s   192.168.0.52   kube2    <none>           <none>
csi-cephfsplugin-dbbfb                          3/3     Running   0          3m38s   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-8kt96   5/5     Running   0          3m37s   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-kq6vv   5/5     Running   0          3m38s   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-4qrqn                             3/3     Running   0          3m39s   192.168.0.53   kube3    <none>           <none>
csi-rbdplugin-dqx9z                             3/3     Running   0          3m39s   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-7f57t       6/6     Running   0          3m39s   10.1.2.5       kube3    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-9zwhb       6/6     Running   0          3m39s   10.1.1.5       kube2    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-rgqpg          1/1     Running   0          2m40s   10.1.1.7       kube2    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-n2pwc          1/1     Running   0          2m35s   10.1.2.8       kube3    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-fv46f         0/1     Pending   0          2m30s   <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-2m25g             1/1     Running   0          6m27s   10.1.2.3       kube3    <none>           <none>
rook-discover-lpsht                             1/1     Running   0          5m15s   10.1.1.3       kube2    <none>           <none>
rook-discover-v4l77                             1/1     Running   0          5m15s   10.1.2.4       kube3    <none>           <none>
部署它并描述吊舱的细节

...
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    9m28s                  default-scheduler  Successfully assigned default/nginx to kube2
  Warning  FailedMount  9m28s                  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  6m14s (x2 over 6m38s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  4m6s (x23 over 9m13s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched

rook-ceph-mon-x吊舱具有以下亲和力:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname
这不允许在同一节点上运行2个rook ceph mon pod。 因为您似乎有3个节点:1个主节点和2个工作节点,所以创建了2个pod,一个在kube2上,另一个在kube3节点上。kube1是不可调度的主节点,因此rook-ceph-mon-c不能在那里调度

要解决此问题,您可以:

再添加一个工作节点 使用kubectl污染节点kube1键删除NoSchedule污染:NoSchedule- 更改为较低的值
rook-ceph-mon-x吊舱具有以下亲和力:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname
这不允许在同一节点上运行2个rook ceph mon pod。 因为您似乎有3个节点:1个主节点和2个工作节点,所以创建了2个pod,一个在kube2上,另一个在kube3节点上。kube1是不可调度的主节点,因此rook-ceph-mon-c不能在那里调度

要解决此问题,您可以:

再添加一个工作节点 使用kubectl污染节点kube1键删除NoSchedule污染:NoSchedule- 更改为较低的值
共享kubectl get events的输出您是否创建了rook ceph群集,如下一步所述?@ArghyaSadhu是的,我运行了这个kubectl create-f cluster.yaml。但是没有更改kubectl的cluster.yaml文件.share输出中的任何内容,说明挂起或崩溃的pod的podname和kubectl getevents@JingqiangZhang我不明白删除吊舱的步骤。顺便说一句,rook-ceph-mon-c-canary pod挂起,因为节点亲和性不匹配0/1节点可用:1个节点不匹配pod亲和性/反亲和性,1节点不满足现有pods反亲和规则。@ArghyaSadhu我在问题的底部添加了重新部署的信息。kubectl get事件的共享输出是否已创建rook ceph群集,如下一步中所述?@ArghyaSadhu是的,我运行了此kubectl创建-f cluster.yaml。但是没有更改kubectl的cluster.yaml文件.share输出中的任何内容,说明挂起或崩溃的pod的podname和kubectl getevents@JingqiangZhang我不明白删除吊舱的步骤。顺便说一句,rook-ceph-mon-c-canary pod挂起,因为节点亲和性不匹配0/1个节点可用:1个节点不匹配pod亲和性/反亲和性,1个节点不满足现有的pod反亲和性规则。@ArghyaSadhu我在问题底部添加了重新部署的信息。谢谢。你给了我解决问题的正确方法。我把mon数改为2。它起作用了。但当我部署nginx吊舱来测试挂载时,它无法挂载。我把这个错误放在问题的底部。根据我的说法,自从RookV1.1以来,FlexVolume在默认情况下没有启用。你启用了吗@景强强照我说的做了,没有运气。我想可能是我的k8s集群的原因。我要试试。你能告诉我怎么启用它吗?如果不启用,可以使用什么默认卷?谢谢!最后,一切顺利!你帮了我。太专业了,谢谢你。你给了我解决问题的正确方法。我把mon数改为2。它起作用了。但当我部署nginx吊舱来测试挂载时,它无法挂载。我把这个错误放在问题的底部。根据我的说法,自从RookV1.1以来,FlexVolume在默认情况下没有启用。你启用了吗@景强强照我说的做了,没有运气。我想可能是我的k8s集群的原因。我要试试。你能告诉我怎么启用它吗?如果不启用,可以使用什么默认卷?谢谢!最后,一切顺利!你帮了我。太专业了。
$ kubectl describe pod rook-ceph-mon-c-canary-78b48dbfb7-fv46f -n rook-ceph
Name:           rook-ceph-mon-c-canary-78b48dbfb7-fv46f
Namespace:      rook-ceph
Priority:       0
Node:           <none>
Labels:         app=rook-ceph-mon
                ceph_daemon_id=c
                mon=c
                mon_canary=true
                mon_cluster=rook-ceph
                pod-template-hash=78b48dbfb7
                rook_cluster=rook-ceph
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/rook-ceph-mon-c-canary-78b48dbfb7
Containers:
  mon:
    Image:      rook/ceph:v1.3.4
    Port:       6789/TCP
    Host Port:  0/TCP
    Command:
      /tini
    Args:
      --
      sleep
      3600
    Environment:
      CONTAINER_IMAGE:                ceph/ceph:v14.2.9
      POD_NAME:                       rook-ceph-mon-c-canary-78b48dbfb7-fv46f (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
      ROOK_POD_IP:                     (v1:status.podIP)
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-65xtn (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  rook-config-override:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-config-override
    Optional:  false
  rook-ceph-mons-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-mons-keyring
    Optional:    false
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/log
    HostPathType:  
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/crash
    HostPathType:  
  ceph-daemon-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/mon-c/data
    HostPathType:  
  default-token-65xtn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-65xtn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  22s (x3 over 84s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html
  volumes:
  - name: www
    flexVolume:
      driver: ceph.rook.io/rook
      fsType: ceph
      options:
        fsName: myfs
        clusterNamespace: rook-ceph
...
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    9m28s                  default-scheduler  Successfully assigned default/nginx to kube2
  Warning  FailedMount  9m28s                  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  6m14s (x2 over 6m38s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  4m6s (x23 over 9m13s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname