Prometheus 如何使用普罗米修斯监控traefik 2.1.6

Prometheus 如何使用普罗米修斯监控traefik 2.1.6,prometheus,Prometheus,我遵循以下步骤,通过添加以下内容启用traefik 2.1.6中的Prometheus monitor: args: - --configfile=/config/traefik.yaml - --web - --kubernetes - --logLevel=INFO - --metrics.prometheus=true - --entryPoint

我遵循以下步骤,通过添加以下内容启用traefik 2.1.6中的Prometheus monitor:

args:
            - --configfile=/config/traefik.yaml
            - --web
            - --kubernetes
            - --logLevel=INFO
            - --metrics.prometheus=true
            - --entryPoints.metrics.address=:8080
            - --metrics.prometheus.entryPoint=metrics
            - --metrics.prometheus.addServicesLabels=true
            - --metrics.prometheus.addEntryPointsLabels=true
            - --metrics.prometheus.buckets=0.100000, 0.300000, 1.200000, 5.000000
按如下方式修改我的traefik部署:

apiVersion: v1
kind: Service
metadata:
  name: traefik
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8080'
spec:
  ports:
    - name: web
      port: 80
    - name: websecure
      port: 443
    - name: metrics
      port: 8080
  selector:
    app: traefik
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: traefik-ingress-controller
  labels:
    app: traefik
spec:
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      name: traefik
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 1
      containers:
        - image: traefik:latest
          name: traefik-ingress-lb
          ports:
            - name: web
              containerPort: 80
              hostPort: 80           #hostPort方式,将端口暴露到集群节点
            - name: websecure
              containerPort: 443
              hostPort: 443          #hostPort方式,将端口暴露到集群节点
            - name: metrics
              containerPort: 8080
          resources:
            limits:
              cpu: 2000m
              memory: 1024Mi
            requests:
              cpu: 1000m
              memory: 1024Mi
          securityContext:
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
          args:
            - --configfile=/config/traefik.yaml
            - --web
            - --kubernetes
            - --logLevel=INFO
            - --metrics.prometheus=true
            - --entryPoints.metrics.address=:8080
            - --metrics.prometheus.entryPoint=metrics
            - --metrics.prometheus.addServicesLabels=true
            - --metrics.prometheus.addEntryPointsLabels=true
            - --metrics.prometheus.buckets=0.100000, 0.300000, 1.200000, 5.000000
          volumeMounts:
            - mountPath: "/config"
              name: "config"
      volumes:
        - name: config
          configMap:
            name: traefik-config 
      tolerations:              #设置容忍所有污点,防止节点被设置污点
        - operator: "Exists"
      nodeSelector:             #设置node筛选器,在特定label的节点上启动
        IngressProxy: "true"
当我转到grafana仪表板查看数据时,没有任何输出。这是我的仪表板定义:

{
  "__inputs": [
    {
      "name": "DS_K8S-PROMETHEUS",
      "label": "k8s-prometheus",
      "description": "",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ],
  "__requires": [
    {
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "5.0.3"
    },
    {
      "type": "panel",
      "id": "grafana-piechart-panel",
      "name": "Pie Chart",
      "version": "1.3.3"
    },
    {
      "type": "panel",
      "id": "graph",
      "name": "Graph",
      "version": "5.0.0"
    },
    {
      "type": "datasource",
      "id": "prometheus",
      "name": "Prometheus",
      "version": "5.0.0"
    }
  ],
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "description": "Traefik dashboard prometheus\n\nPrometheus监控traefik总览面板",
  "editable": true,
  "gnetId": 9682,
  "graphTooltip": 0,
  "id": null,
  "iteration": 1547710539645,
  "links": [],
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 8,
      "panels": [],
      "title": "全局监控",
      "type": "row"
    },
    {
      "aliasColors": {},
      "breakPoint": "50%",
      "cacheTimeout": null,
      "combine": {
        "label": "Others",
        "threshold": "0"
      },
      "datasource": "${DS_K8S-PROMETHEUS}",
      "fontSize": "80%",
      "format": "locale",
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 2,
      "interval": null,
      "legend": {
        "percentage": true,
        "show": true,
        "sideWidth": null,
        "values": true
      },
      "legendType": "Right side",
      "links": [],
      "maxDataPoints": 3,
      "minSpan": 23,
      "nullPointMode": "connected",
      "pieType": "pie",
      "repeat": null,
      "repeatDirection": "h",
      "strokeWidth": 1,
      "targets": [
        {
          "expr": "sum(traefik_backend_requests_total{k8scluster =~ \"^$Cluster$\", backend=~\"^$backend$\"}) by (backend) ",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{backend}}",
          "refId": "A"
        }
      ],
      "title": "访问量占比",
      "transparent": false,
      "type": "grafana-piechart-panel",
      "valueName": "total"
    },
    {
      "aliasColors": {},
      "breakPoint": "50%",
      "cacheTimeout": null,
      "combine": {
        "label": "Others",
        "threshold": 0
      },
      "datasource": "${DS_K8S-PROMETHEUS}",
      "fontSize": "80%",
      "format": "locale",
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 1
      },
      "id": 12,
      "interval": null,
      "legend": {
        "percentage": true,
        "percentageDecimals": null,
        "show": true,
        "values": true
      },
      "legendType": "Right side",
      "links": [],
      "maxDataPoints": 3,
      "nullPointMode": "connected",
      "pieType": "pie",
      "strokeWidth": 1,
      "targets": [
        {
          "expr": "sum(traefik_backend_requests_total{k8scluster =~ \"^$Cluster$\", backend=~\"^$backend$\", code != \"200\"}) by (code) ",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "{{ code }}",
          "refId": "A"
        }
      ],
      "title": "非200状态码占比",
      "type": "grafana-piechart-panel",
      "valueName": "current"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_K8S-PROMETHEUS}",
      "fill": 1,
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 10,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": false,
        "hideEmpty": false,
        "hideZero": false,
        "max": false,
        "min": false,
        "rightSide": true,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum(traefik_backend_requests_total{k8scluster=~\"^$Cluster$\", backend=~\"^$backend$\"}) by (backend)",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{ backend }}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "总访问量",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "decimals": 0,
          "format": "none",
          "label": "",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "decimals": 0,
          "format": "none",
          "label": "",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        }
      ]
    }
  ],
  "refresh": false,
  "schemaVersion": 16,
  "style": "dark",
  "tags": [
    "traefik",
    "prometheus"
  ],
  "templating": {
    "list": [
      {
        "allValue": ".*",
        "current": {},
        "datasource": "${DS_K8S-PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "Cluster",
        "options": [],
        "query": "label_values(k8scluster)",
        "refresh": 1,
        "regex": ".*",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": ".*",
        "current": {},
        "datasource": "${DS_K8S-PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "backend",
        "options": [],
        "query": "label_values({k8scluster=\"$Cluster\"},backend)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "Traefik-Monitor",
  "uid": "qPdAviJmz",
  "version": 22
}
我登录到我的Prometheus服务器并查询traefik,但什么也找不到:

metrics_entrypoint_requests_total{code="200"}
看起来Treafik没有收集度量数据。我登录kubernetes集群pod并获取traefik的度量,它返回成功

[root@soa-room-service-8fd445cdb-42bvs /]# curl 172.30.184.11:8080/metrics|more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.5522e-05
go_gc_duration_seconds{quantile="0.25"} 2.1096e-05
go_gc_duration_seconds{quantile="0.5"} 2.6706e-05
go_gc_duration_seconds{quantile="0.75"} 6.0421e-05
go_gc_duration_seconds{quantile="1"} 0.054839752
go_gc_duration_seconds_sum 0.066358769
go_gc_duration_seconds_count 46
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 174
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.8"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.4143352e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.52690104e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.551946e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter

我应该怎么做才能使它工作?

所描述的问题太广泛了。你应该缩小范围,看看哪个部分“坏了”

您可以通过以下步骤执行此操作:

首先去你的普罗米修斯服务器,检查你是否能在那里看到报告的指标

  • 如果是-在格拉法纳中,与普罗米修斯的联系的定义存在问题

  • 如果否,则表示应用程序不报告度量值,或者服务器不从应用程序收集度量值。因此,首先检查您的应用程序是否报告了度量值,但试图刮取应用程序dns/度量值,结果应该如下所示

如果您可以看到这些指标,则表示prometheus服务器不会刮取您的应用程序,因此您应该检查服务器的配置。如果你看不到指标,这意味着traefik不会收集这些指标,你应该重新检查配置等