Kubernetes 应注意哪种公制织网?

Kubernetes 应注意哪种公制织网?,kubernetes,prometheus,weave,Kubernetes,Prometheus,Weave,暴露了以下几点 以下警报看起来正确吗?对于这些指标的哪些值,我们应该提高警惕以监控网络健康状况 WeaveNoFastDP Weavenu流量[5m]>0 WeaviePamUnreachable weave\u ipam\u unreachable\u百分比>0 WeavePampendingAllocates weave\u ipam\u pending\u分配>0 WeavePendingClaims weave_ipam_PendingClaims>0 WeaveConnectEM编

暴露了以下几点

以下警报看起来正确吗?对于这些指标的哪些值,我们应该提高警惕以监控网络健康状况

  • WeaveNoFastDP Weavenu流量[5m]>0
  • WeaviePamUnreachable weave\u ipam\u unreachable\u百分比>0
  • WeavePampendingAllocates weave\u ipam\u pending\u分配>0
  • WeavePendingClaims weave_ipam_PendingClaims>0
  • WeaveConnectEM编织连接终端总数>300

将grafana仪表板置于weave metrics之上。 这是仪表盘

  • 维维内特
  • WeaveNet(集群)
  • 以下是应监控编织网的有用指标。以下警报为json格式

    {
    “团体”:[
    {
    “名称”:“节点代理”,
    “规则”:[
    {
    “警报”:“不健康节点”,
    “expr”:“更改(中心节点数:节点数、路由数[3m])>0”,
    “用于”:“1m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“群集中不健康的节点。有关详细信息,请转到下面的prometheus链接。”,
    “描述”:“可操作:查找节点不健康的原因并修复它。”
    }
    }
    ]
    },
    {
    “名称”:“编织网”,
    “规则”:[
    {
    “警报”:“WeaveNetIPAMSPlitBrain”,
    “expr”:“最大值(织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入/织入百分比)>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “概要”:“WeaveNetIPAM有一个分裂的大脑。有关详细信息,请访问下面的普罗米修斯链接。”,
    “描述”:“可操作:每个节点都应看到相同的不可访问性百分比。请检查并修复其原因。”
    }
    },
    {
    “警报”:“WeavenetiPamunReach”,
    “expr”:“织物/ipam/无法达到/百分比[10m]>25”,
    “用于”:“10m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“WeaveNetIPAM不可访问百分比高于阈值。有关详细信息,请访问下面的prometheus链接。”,
    “描述”:“可操作:查找无法访问阈值高于阈值的原因并修复它。WeaveNet负责控制它。WeaveNet rm对等部署可以帮助清理问题。”
    }
    },
    {
    “警报”:“WeaveNetIPAMPendingAllocates”,
    “expr”:“总和(编织ipam未决分配)>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“WeaveNet IPAM具有挂起的分配。有关详细信息,请转到下面的prometheus链接。”,
    “描述”:“可操作:查找IPAM分配处于挂起状态的原因并修复它。”
    }
    },
    {
    “警报”:“WeaveNetIPAMPendingClaims”,
    “expr”:“总金额(未决索赔)>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“WeaveNet IPAM有未决索赔。有关详细信息,请访问以下普罗米修斯链接。”,
    “说明”:“可采取行动:查找IPAM声明处于挂起状态的原因并进行修复。”
    }
    },
    {
    “警报”:“WeavenetFastdFlowSlow”,
    “expr”:“总流量(weave_flows)<15000”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“WeaveNet FastDP总流量低于阈值。有关详细信息,请访问下面的prometheus链接。”,
    “说明”:“可操作:查找快速dp流降至阈值以下的原因。”
    }
    },
    {
    “警报”:“WeavenetFastdFlowsof”,
    “expr”:“总和(weave_flows==bool 0)>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“WeaveNet FastDP流未在部分或所有节点中发生。有关详细信息,请转到下面的prometheus链接。”,
    “说明”:“可操作:查找快速dp关闭的原因。”
    }
    },
    {
    “警报”:“WeaveNetHighConnectionTerminationRate”,
    “expr”:“速率(编织连接端接总数[5m])>0.1”,
    “用于”:“5m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“许多连接正在被终止。有关详细信息,请访问下面的普罗米修斯链接。”,
    “描述”:“可操作:找到高连接终止率的原因并修复它。”
    }
    },
    {
    “警报”:“WeavenetConnectionsConnections”,
    “expr”:“和(weave_连接{state='connecting'})>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“许多连接处于连接状态。有关详细信息,请访问下面的普罗米修斯链接。”,
    “说明”:“可操作:找到原因并修复它。”
    }
    },
    {
    “警报”:“WeavenetConnectionsReing”,
    “expr”:“和(weave_连接{state='retrying'})>0”,
    “for”:“3m”,
    “标签”:{
    “严重性”:“严重”
    },
    “注释”:{
    “摘要”:“许多连接处于重试状态。有关详细信息,请转到下面的prometheus链接。”,
    “说明”:“可操作:找到原因并修复它。”
    }
    },
    {
    “警报”:“WeavenetConnection支出”,
    “expr”:“和(weave_连接{state='pending'})>0”,
    “for”:“3m”,
    
    {
      "groups": [
        {
          "name": "nodeagent",
          "rules": [
            {
              "alert": "UnhealthyNodes",
              "expr": "changes(central_nodeagent:node_route_unhealthy_count[3m]) > 0",
              "for": "1m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "Unhealthy nodes in the cluster. Go to prometheus the below prometheus link for details.",
                "description": "Actionable: Find why the node(s) are unhealthy and fix it."
              }
            }
          ]
        },
        {
          "name": "weave-net",
          "rules": [
            {
              "alert": "WeaveNetIPAMSPlitBrain",
              "expr": "max(weave_ipam_unreachable_percentage) - min(weave_ipam_unreachable_percentage) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNetIPAM has a split brain. Go to the below prometheus link for details.",
                "description": "Actionable: Every node should see same unreachability percentage. Please check and fix why it is not so."
              }
            },
            {
              "alert": "WeaveNetIPAMUnreachable",
              "expr": "weave_ipam_unreachable_percentage[10m] > 25",
              "for": "10m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNetIPAM unreachability percentage is above threshold. Go to the below prometheus link for details.",
                "description": "Actionable: Find why the unreachability threshold have increased from threshold and fix it. WeaveNet is responsible to keep it under control. Weave rm peer deployment can help clean things."
              }
            },
            {
              "alert": "WeaveNetIPAMPendingAllocates",
              "expr": "sum(weave_ipam_pending_allocates) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNet IPAM has pending allocates. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason for IPAM allocates to be in pending state and fix it."
              }
            },
            {
              "alert": "WeaveNetIPAMPendingClaims",
              "expr": "sum(weave_ipam_pending_claims) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNet IPAM has pending claims. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason for IPAM claims to be in pending state and fix it."
              }
            },
            {
              "alert": "WeaveNetFastDPFlowsLow",
              "expr": "sum(weave_flows) < 15000",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNet total FastDP flows is below threshold. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason for fast dp flows dropping below the threshold."
              }
            },
            {
              "alert": "WeaveNetFastDPFlowsOff",
              "expr": "sum(weave_flows == bool 0) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "WeaveNet FastDP flows is not happening in some or all nodes. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason for fast dp being off."
              }
            },
            {
              "alert": "WeaveNetHighConnectionTerminationRate",
              "expr": "rate(weave_connection_terminations_total[5m]) > 0.1",
              "for": "5m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "A lot of connections are getting terminated. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason for high connection termination rate and fix it."
              }
            },
            {
              "alert": "WeaveNetConnectionsConnecting",
              "expr": "sum(weave_connections{state='connecting'}) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "A lot of connections are in connecting state. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason and fix it."
              }
            },
            {
              "alert": "WeaveNetConnectionsRetying",
              "expr": "sum(weave_connections{state='retrying'}) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "A lot of connections are in retrying state. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason and fix it."
              }
            },
            {
              "alert": "WeaveNetConnectionsPending",
              "expr": "sum(weave_connections{state='pending'}) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "A lot of connections are in pending state. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason and fix it."
              }
            },
            {
              "alert": "WeaveNetConnectionsFailed",
              "expr": "sum(weave_connections{state='failed'}) > 0",
              "for": "3m",
              "labels": {
                "severity": "critical"
              },
              "annotations": {
                "summary": "A lot of connections are in failed state. Go to the below prometheus link for details.",
                "description": "Actionable: Find the reason and fix it."
              }
            }
          ]
        }
      ]
    }