Grafana 安装监视时,IBM Cloud Private 2.1.0.1 ee因超时错误而失败

Grafana 安装监视时,IBM Cloud Private 2.1.0.1 ee因超时错误而失败,grafana,prometheus,ibm-cloud-private,Grafana,Prometheus,Ibm Cloud Private,我一直在尝试在单个节点中设置ICP EE,但在部署监控服务任务时,我总是遇到安装失败的问题 此特定任务运行约30分钟,然后失败。下面是我得到的错误日志 我需要做些不同的事情吗 为此,我使用了知识中心上的基本安装步骤 TASK [monitoring : Deploying monitoring service] ******************************* fatal: [localhost]: FAILED! => { "changed":true,

我一直在尝试在单个节点中设置ICP EE,但在部署监控服务任务时,我总是遇到安装失败的问题

此特定任务运行约30分钟,然后失败。下面是我得到的错误日志

我需要做些不同的事情吗

为此,我使用了知识中心上的基本安装步骤

TASK [monitoring : Deploying monitoring service] 

*******************************
    fatal: [localhost]: FAILED! => {
   "changed":true,
   "cmd":"kubectl apply --force --overwrite=true -f /installer/playbook/..//cluster/cfc-components/monitoring/",
   "delta":"0:30:37.425771",
   "end":"2018-02-26 17:19:04.780643",
   "failed":true,
   "rc":1,
   "start":"2018-02-26 16:48:27.354872",
   "stderr":"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout\nError from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)",
   "stderr_lines":[
      "Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout",
      "Error from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)"
   ],
   "stdout":"configmap \"alert-rules\" created\nconfigmap \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-prometheus-alertmanager\" created\nconfigmap \"alertmanager-router-nginx-config\" created\nservice \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-exporter\" created\nservice \"monitoring-exporter\" created\nconfigmap \"monitoring-grafana-config\" created\ndeployment \"monitoring-grafana\" created\nconfigmap \"grafana-entry-config\" created\nservice \"monitoring-grafana\" created\njob \"monitoring-grafana-ds\" created\nconfigmap \"grafana-ds-entry-config\" created\nservice \"monitoring-prometheus-kubestatemetrics\" created\ndaemonset \"monitoring-prometheus-nodeexporter-amd64\" created\ndaemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created\ndaemonset \"monitoring-prometheus-nodeexporter-s390x\" created\nservice \"monitoring-prometheus-nodeexporter\" created\nconfigmap \"monitoring-prometheus\" created\ndeployment \"monitoring-prometheus\" created\nconfigmap \"prometheus-router-nginx-config\" created\nservice \"monitoring-prometheus\" created\nconfigmap \"monitoring-router-entry-config\" created",
   "stdout_lines":[
      "configmap \"alert-rules\" created",
      "configmap \"monitoring-prometheus-alertmanager\" created",
      "deployment \"monitoring-prometheus-alertmanager\" created",
      "configmap \"alertmanager-router-nginx-config\" created",
      "service \"monitoring-prometheus-alertmanager\" created",
      "deployment \"monitoring-exporter\" created",
      "service \"monitoring-exporter\" created",
      "configmap \"monitoring-grafana-config\" created",
      "deployment \"monitoring-grafana\" created",
      "configmap \"grafana-entry-config\" created",
      "service \"monitoring-grafana\" created",
      "job \"monitoring-grafana-ds\" created",
      "configmap \"grafana-ds-entry-config\" created",
      "service \"monitoring-prometheus-kubestatemetrics\" created",
      "daemonset \"monitoring-prometheus-nodeexporter-amd64\" created",
      "daemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created",
      "daemonset \"monitoring-prometheus-nodeexporter-s390x\" created",
      "service \"monitoring-prometheus-nodeexporter\" created",
      "configmap \"monitoring-prometheus\" created",
      "deployment \"monitoring-prometheus\" created",
      "configmap \"prometheus-router-nginx-config\" created",
      "service \"monitoring-prometheus\" created",
      "configmap \"monitoring-router-entry-config\" created"
   ]
}

此节点是否至少有16G内存(甚至32G)?可能是由于POD即将上线,主机被初始负载压得喘不过气来

要测试的第二件事是应用此目录时发生的情况:

  • 您可以从命令行重新运行相同的操作:
    
    光盘集群/
    kubectl apply--force--overwrite=true-f cfc组件/监控/
    

  • 然后你可以在幕后反思发生了什么:

kubectl-n kube系统获得吊舱-o宽

  • 吊舱是否处于非运行状态
  • 吊舱内的容器是否未启动(例如显示0/2或1/3或类似)
  • journalctl-ru kubelet-o cat|head-n 500>kubelet logs.txt

  • kubelet会抱怨不能启动集装箱吗
  • kubelet会抱怨Docker不健康吗

  • 如果某个pod显示它不健康(从#1/#2开始),则描述它并验证是否有任何事件表明它失败的原因:

  • kubectl-n kube系统描述吊舱[故障吊舱名称]

    如果您尚未在主机上配置与系统交互的
    kubectl
    ,或者如果尚未部署
    auth idp
    pod,则可以使用以下步骤配置
    kubectl

    • 将kubectl二进制文件复制到主机上,然后使用本地kubelet配置。您可以更新shell配置文件(例如,
      .bash\u配置文件
      )中的
      KUBECONFIG
      文件,使其适用于每个终端会话
    docker run-e LICENSE=accept-v/usr/local/bin:/data\
    ibmcom/icp初始版本:[您的版本]\
    cp/usr/local/bin/kubectl/data
    导出KUBECONFIG=/var/lib/kubelet/kubelet配置
    

    此节点是否至少有16G内存(甚至32G)?可能是由于POD即将上线,主机被初始负载压得喘不过气来

    要测试的第二件事是应用此目录时发生的情况:

    • 您可以从命令行重新运行相同的操作:
      
      光盘集群/
      kubectl apply--force--overwrite=true-f cfc组件/监控/
      

    • 然后你可以在幕后反思发生了什么:

    kubectl-n kube系统获得吊舱-o宽

  • 吊舱是否处于非运行状态
  • 吊舱内的容器是否未启动(例如显示0/2或1/3或类似)
  • journalctl-ru kubelet-o cat|head-n 500>kubelet logs.txt

  • kubelet会抱怨不能启动集装箱吗
  • kubelet会抱怨Docker不健康吗

  • 如果某个pod显示它不健康(从#1/#2开始),则描述它并验证是否有任何事件表明它失败的原因:

  • kubectl-n kube系统描述吊舱[故障吊舱名称]

    如果您尚未在主机上配置与系统交互的
    kubectl
    ,或者如果尚未部署
    auth idp
    pod,则可以使用以下步骤配置
    kubectl

    • 将kubectl二进制文件复制到主机上,然后使用本地kubelet配置。您可以更新shell配置文件(例如,
      .bash\u配置文件
      )中的
      KUBECONFIG
      文件,使其适用于每个终端会话
    docker run-e LICENSE=accept-v/usr/local/bin:/data\
    ibmcom/icp初始版本:[您的版本]\
    cp/usr/local/bin/kubectl/data
    导出KUBECONFIG=/var/lib/kubelet/kubelet配置