Prometheus 如何让普罗米修斯在特定时间保持警觉_Prometheus_Prometheus Alertmanager

Prometheus 如何让普罗米修斯在特定时间保持警觉

prometheus

Prometheus 如何让普罗米修斯在特定时间保持警觉,prometheus,prometheus-alertmanager,Prometheus,Prometheus Alertmanager,我在普罗米修斯记忆警报方面遇到了一些问题。如果我备份Gitlab，那么内存使用率将上升到95%。我想在特定时间内暂停记忆警报 e、 g.如果我在凌晨2点进行备份，那么我需要打盹，提醒普罗米修斯记忆。可能吗？不，不可能有计划的静音针对您的案例的一些变通方法： 1）也许您可以更改Prometheus配置并增加“for”子句，以便在不触发警报的情况下有更多时间执行备份 2）您可以使用RESTAPI在备份开始/结束时创建/删除静默查看有关此主题的更多信息。正如Marcelo所说，没有办法安排静默

我在普罗米修斯记忆警报方面遇到了一些问题。如果我备份Gitlab，那么内存使用率将上升到95%。我想在特定时间内暂停记忆警报

e、 g.如果我在凌晨2点进行备份，那么我需要打盹，提醒普罗米修斯记忆。可能吗？

不，不可能有计划的静音

针对您的案例的一些变通方法：

1）也许您可以更改Prometheus配置并增加“for”子句，以便在不触发警报的情况下有更多时间执行备份

2）您可以使用RESTAPI在备份开始/结束时创建/删除静默

查看有关此主题的更多信息。

正如Marcelo所说，没有办法安排静默，但如果备份是在固定的时间间隔内进行的（比如每晚从凌晨2点到凌晨3点），则可以将其包含在警报表达式中

- alert: OutOfMemory
  expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 AND ON() absent(hour() >= 2 <= 3)

然后在这段时间内使用抑制规则使目标规则静音：

inhibit_rules:
- source_match:
    alertname: BackupHours
  target_match:
    # here can be any other selection of alert
    alertname: OutOfMemory

请注意，它仅适用于UTC计算。如果您需要DST，它需要更多的样板（通过示例记录规则）

请注意，如果您正在监视备份过程，您可能已经有了一个指标，表明备份正在进行中。如果是这样，您可以使用此指标来禁止其他警报，并且您不需要维护计划。

您可以在历史记录中比较条件，因此，如果指标在过去两天内的差异不超过2次，则不会弹出警报

      - alert: CPULoadAlert
        # Condition for alerting
        expr: >-
          node_load5 / node_load5 offset 1d > 2 and
          node_load5 / node_load5 offset 2d > 2 and
          node_load5 > 1
        for: 5m
        # Annotation - additional informational labels to store more information
        annotations:
          summary: 'Instance {{ $labels.instance }} got an unusual high load on CPU'
          description: '{{ $labels.instance }} of job {{ $labels.job }} got CPU spike over 2x compared to previous 2 days.'
        # Labels - additional labels to be attached to the alert
        labels:
          severity: 'warning'

谢谢。而且可能也有帮助。

inhibit_rules:
- source_match:
    alertname: BackupHours
  target_match:
    # here can be any other selection of alert
    alertname: OutOfMemory

      - alert: CPULoadAlert
        # Condition for alerting
        expr: >-
          node_load5 / node_load5 offset 1d > 2 and
          node_load5 / node_load5 offset 2d > 2 and
          node_load5 > 1
        for: 5m
        # Annotation - additional informational labels to store more information
        annotations:
          summary: 'Instance {{ $labels.instance }} got an unusual high load on CPU'
          description: '{{ $labels.instance }} of job {{ $labels.job }} got CPU spike over 2x compared to previous 2 days.'
        # Labels - additional labels to be attached to the alert
        labels:
          severity: 'warning'