测试规则AlertManager失败：yaml:解组错误：第1行：在main.unitTestFile类型中找不到字段组_Alert_Prometheus_Prometheus Alertmanager

测试规则AlertManager失败：yaml:解组错误：第1行：在main.unitTestFile类型中找不到字段组

prometheus

测试规则AlertManager失败：yaml:解组错误：第1行：在main.unitTestFile类型中找不到字段组,alert,prometheus,prometheus-alertmanager,Alert,Prometheus,Prometheus Alertmanager,请帮助我在测试下面的警报管理器时收到错误消息 promtool check rules /etc/prometheus/alert.rules.yml Checking /etc/prometheus/alert.rules.yml SUCCESS: 3 rules found promtool test rules /etc/prometheus/alert.rules.yml Unit Testing: /etc/prometheus/alert.rules.yml FAIL

请帮助我在测试下面的警报管理器时收到错误消息

 promtool check rules /etc/prometheus/alert.rules.yml
 Checking /etc/prometheus/alert.rules.yml
 SUCCESS: 3 rules found

 promtool test rules /etc/prometheus/alert.rules.yml
 Unit Testing:  /etc/prometheus/alert.rules.yml
 FAILED:
 yaml: unmarshal errors:
 line 1: field groups not found in type main.unitTestFile

我的警报。规则
配置如下：

      cat /etc/prometheus/alert.rules.yml
      groups:
      - alert: MemoryFree10%
        expr: node_exporter:node_memory_free:memory_used_percents >= 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} hight memory usage"
          description: "{{ $labels.instance }} has more than 90% of its memory used."
      - alert: DiskSpace10%Free
        expr: node_exporter:node_filesystem_free:fs_used_percents >= 90
        labels:
          severity: moderate
        annotations:
          summary: "Instance {{ $labels.instance }} is low on disk space"
          description: "{{ $labels.instance }} has only {{ $value }}% free."
      - alert: ExporterDown
        expr: up == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Exporter down (instance {{ $labels.instance }})"
          description: "Prometheus exporter down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

我们的文件警报规则是否缺失或不正确

请帮忙

谢谢

您的配置缺少规则

    groups:
    - name: alert.rules
      rules:
      - alert: HighRequestLatency
      .....

您正在警报规则文件上运行单元测试。您应该首先编写测试文件，然后通过

promtool test rules test.yml

在测试文件上运行单元测试

这里是一个来自

警报。yml

# This is the rules file.

groups:
- name: example
  rules:

  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  - alert: AnotherInstanceDown
    expr: up == 0
    for: 10m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

# This is the main input for unit testing.
# Only this file is passed as command line argument.

rule_files:
    - alerts.yml

evaluation_interval: 1m

tests:
    # Test 1.
    - interval: 1m
      # Series data.
      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
          - series: 'up{job="node_exporter", instance="localhost:9100"}'
            values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
          - series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
            values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
          - series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
            values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130

      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 10m
            alertname: InstanceDown
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                      instance: localhost:9090
                      job: prometheus
                  exp_annotations:
                      summary: "Instance localhost:9090 down"
                      description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
      # Unit tests for promql expressions.
      promql_expr_test:
          # Unit test 1.
          - expr: go_goroutines > 5
            eval_time: 4m
            exp_samples:
                # Sample 1.
                - labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
                  value: 50
                # Sample 2.
                - labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
                  value: 50

test.yml

# This is the rules file.

groups:
- name: example
  rules:

  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  - alert: AnotherInstanceDown
    expr: up == 0
    for: 10m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

# This is the main input for unit testing.
# Only this file is passed as command line argument.

rule_files:
    - alerts.yml

evaluation_interval: 1m

tests:
    # Test 1.
    - interval: 1m
      # Series data.
      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
          - series: 'up{job="node_exporter", instance="localhost:9100"}'
            values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
          - series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
            values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
          - series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
            values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130

      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 10m
            alertname: InstanceDown
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                      instance: localhost:9090
                      job: prometheus
                  exp_annotations:
                      summary: "Instance localhost:9090 down"
                      description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
      # Unit tests for promql expressions.
      promql_expr_test:
          # Unit test 1.
          - expr: go_goroutines > 5
            eval_time: 4m
            exp_samples:
                # Sample 1.
                - labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
                  value: 50
                # Sample 2.
                - labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
                  value: 50

然后您可以运行

promtool测试规则test.yml

，您将得到如下结果

Unit Testing:  test.yml
  SUCCESS

使用promtool检查配置文件的语法时，必须使用“/promtool检查配置prometheus.yml” 这个prometheus.yml是一个父文件，它将调用prometheus规则文件prometheus_rules.yml。

因此，当使用promtool检查规则文件的语法时，您必须使用“/promtool检查规则prometheus_rules.yml”

您好，谢谢您的回复。但是我已经使用规则测试了错误组：-name:alerting.rules规则：-alert:ExporterDown expr:up==0 for:5m标签：严重性：警告注释：摘要：“导出器关闭（实例{{$labels.instance}）”promtool测试规则/etc/prometheus/alert.rules.yml单元测试：/etc/prometheus/alert.rules.yml失败：yaml:unmarshal错误：第1行：在main.unitTestFile类型中找不到字段组我已经添加了完整的yml文件并运行了promtool，一切都成功了，您确定您的yaml是正确的还是格式化的？`````组：-名称：规则规则：-警报：MemoryFree10%expr:node_导出器：node_memory_free:memory_used_percents>=90用于：5m标签：严重性：关键注释：摘要：“实例{{$labels.Instance}}高内存使用率”描述：“{$labels.Instance}使用了超过90%的内存。”“``ehmm奇怪，我总是尝试出错失败：yaml:解组错误：第1行：在main.unittestfile类型中找不到字段组你能把它放在gist中吗？对不起@MichaelDoubez gist是什么意思？