测试规则AlertManager失败:yaml:解组错误:第1行:在main.unitTestFile类型中找不到字段组
请帮助我在测试下面的警报管理器时收到错误消息测试规则AlertManager失败:yaml:解组错误:第1行:在main.unitTestFile类型中找不到字段组,alert,prometheus,prometheus-alertmanager,Alert,Prometheus,Prometheus Alertmanager,请帮助我在测试下面的警报管理器时收到错误消息 promtool check rules /etc/prometheus/alert.rules.yml Checking /etc/prometheus/alert.rules.yml SUCCESS: 3 rules found promtool test rules /etc/prometheus/alert.rules.yml Unit Testing: /etc/prometheus/alert.rules.yml FAIL
promtool check rules /etc/prometheus/alert.rules.yml
Checking /etc/prometheus/alert.rules.yml
SUCCESS: 3 rules found
promtool test rules /etc/prometheus/alert.rules.yml
Unit Testing: /etc/prometheus/alert.rules.yml
FAILED:
yaml: unmarshal errors:
line 1: field groups not found in type main.unitTestFile
我的警报。规则
配置如下:
cat /etc/prometheus/alert.rules.yml
groups:
- alert: MemoryFree10%
expr: node_exporter:node_memory_free:memory_used_percents >= 90
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} hight memory usage"
description: "{{ $labels.instance }} has more than 90% of its memory used."
- alert: DiskSpace10%Free
expr: node_exporter:node_filesystem_free:fs_used_percents >= 90
labels:
severity: moderate
annotations:
summary: "Instance {{ $labels.instance }} is low on disk space"
description: "{{ $labels.instance }} has only {{ $value }}% free."
- alert: ExporterDown
expr: up == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Exporter down (instance {{ $labels.instance }})"
description: "Prometheus exporter down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
我们的文件警报规则是否缺失或不正确
请帮忙
谢谢您的配置缺少规则
groups:
- name: alert.rules
rules:
- alert: HighRequestLatency
.....
您正在警报规则文件上运行单元测试。您应该首先编写测试文件,然后通过
promtool test rules test.yml
在测试文件上运行单元测试
这里是一个来自
警报。yml
# This is the rules file.
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
- alert: AnotherInstanceDown
expr: up == 0
for: 10m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# This is the main input for unit testing.
# Only this file is passed as command line argument.
rule_files:
- alerts.yml
evaluation_interval: 1m
tests:
# Test 1.
- interval: 1m
# Series data.
input_series:
- series: 'up{job="prometheus", instance="localhost:9090"}'
values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
- series: 'up{job="node_exporter", instance="localhost:9100"}'
values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
- series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
- series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130
# Unit test for alerting rules.
alert_rule_test:
# Unit test 1.
- eval_time: 10m
alertname: InstanceDown
exp_alerts:
# Alert 1.
- exp_labels:
severity: page
instance: localhost:9090
job: prometheus
exp_annotations:
summary: "Instance localhost:9090 down"
description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
# Unit tests for promql expressions.
promql_expr_test:
# Unit test 1.
- expr: go_goroutines > 5
eval_time: 4m
exp_samples:
# Sample 1.
- labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
value: 50
# Sample 2.
- labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
value: 50
test.yml
# This is the rules file.
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
- alert: AnotherInstanceDown
expr: up == 0
for: 10m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# This is the main input for unit testing.
# Only this file is passed as command line argument.
rule_files:
- alerts.yml
evaluation_interval: 1m
tests:
# Test 1.
- interval: 1m
# Series data.
input_series:
- series: 'up{job="prometheus", instance="localhost:9090"}'
values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
- series: 'up{job="node_exporter", instance="localhost:9100"}'
values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
- series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
- series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130
# Unit test for alerting rules.
alert_rule_test:
# Unit test 1.
- eval_time: 10m
alertname: InstanceDown
exp_alerts:
# Alert 1.
- exp_labels:
severity: page
instance: localhost:9090
job: prometheus
exp_annotations:
summary: "Instance localhost:9090 down"
description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
# Unit tests for promql expressions.
promql_expr_test:
# Unit test 1.
- expr: go_goroutines > 5
eval_time: 4m
exp_samples:
# Sample 1.
- labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
value: 50
# Sample 2.
- labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
value: 50
然后您可以运行promtool测试规则test.yml
,您将得到如下结果
Unit Testing: test.yml
SUCCESS
使用promtool检查配置文件的语法时,必须使用“/promtool检查配置prometheus.yml” 这个prometheus.yml是一个父文件,它将调用prometheus规则文件prometheus_rules.yml。
因此,当使用promtool检查规则文件的语法时,您必须使用“/promtool检查规则prometheus_rules.yml”您好,谢谢您的回复。但是我已经使用规则测试了错误组:-name:alerting.rules规则:-alert:ExporterDown expr:up==0 for:5m标签:严重性:警告注释:摘要:“导出器关闭(实例{{$labels.instance})”promtool测试规则/etc/prometheus/alert.rules.yml单元测试:/etc/prometheus/alert.rules.yml失败:yaml:unmarshal错误:第1行:在main.unitTestFile类型中找不到字段组我已经添加了完整的yml文件并运行了promtool,一切都成功了,您确定您的yaml是正确的还是格式化的?`````组:-名称:规则规则:-警报:MemoryFree10%expr:node_导出器:node_memory_free:memory_used_percents>=90用于:5m标签:严重性:关键注释:摘要:“实例{{$labels.Instance}}高内存使用率”描述:“{$labels.Instance}使用了超过90%的内存。”“``ehmm奇怪,我总是尝试出错失败:yaml:解组错误:第1行:在main.unittestfile类型中找不到字段组你能把它放在gist中吗?对不起@MichaelDoubez gist是什么意思?