Prometheus PROMQL:没有返回数据时如何添加值?

Prometheus PROMQL:没有返回数据时如何添加值?,prometheus,promql,Prometheus,Promql,我有一个数据模型,其中一些指标是按客户机、环境和部署名称命名的。我感兴趣的是创建每个部署的摘要,其中摘要基于每个部署的警报数量 我可以使用以下查询获取dev、uat和prod环境中的部署: group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"}) - 1 # returns the following by way of example: {deployment="defa

我有一个数据模型,其中一些指标是按客户机、环境和部署名称命名的。我感兴趣的是创建每个部署的摘要,其中摘要基于每个部署的警报数量

我可以使用以下查询获取
dev
uat
prod
环境中的部署:

group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"}) - 1

# returns the following by way of example:
{deployment="default",environment="dev",tenant="tenant1"}   0
{deployment="default",environment="prod",tenant="tenant3"}  0
{deployment="default",environment="prod",tenant="tenant2"}  0
{deployment="default",environment="uat",tenant="tenant1"}   0
因此,我们可以看到租户1在2个不同的环境中有2个部署,而其他2个只有一个
groupby
返回一个值1,因此我们减去1得到每个部署的0,现在我希望将适用于每个部署的警报数量添加到该值中

要获取警报,我需要执行以下操作:

ALERTS{severity="warning"}
# returns something like this when there is an alert, the details in the alert will vary, but will always have the `tenant`, `environment` and `deployment` labels
ALERTS{alertname="HostSystemdServiceCrashed",alertstate="firing",instance="example",job="node",deployment="default",environment="dev",tenant="tenant1",name="example.service",severity="warning",state="failed",type="oneshot"} 1

# however, when there are no alerts, I get "no data" returned
我无法确定如何将警报添加到部署中,同时保留未返回警报的部署:

(group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"}) -1)  + on(tenant, environment, deployment) (ALERTS{severity="warning"})

# returns only data for the deployment for which there is an alert
{deployment="default",environment="dev",tenant="tenant1"} 1

# if there are no alerts, I get no data returned at all
我想要的输出是:

{deployment="default",environment="dev",tenant="tenant1"} 1
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
我怎样才能做到这一点

注意:

如果将
sum
一起使用,则根据
的参数顺序,可以得到:

(group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"}) -1)  or sum by (tenant, environment, deployment) (ALERTS{severity="warning"} )

# returns this, note the value in `tenant1|dev|default`
{deployment="default",environment="dev",tenant="tenant1"} 0
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
如果我将参数顺序颠倒为
,我将得到我想要的:

{deployment="default",environment="dev",tenant="tenant1"} 1
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
但如果我想对不同严重性级别的警报应用权重,例如(伪代码),我现在就被卡住了:


这将提供相同的单值序列,或者如果没有警报,则不提供任何数据。

我确信有一种正确的方法可以做到这一点,但最后,我使用
标签\u替换
将任意键值标签添加到我希望添加到原始值的每个子查询,然后对每个子查询应用
。这样可以在不覆盖任何值的情况下合并序列。然后,我能够对结果序列执行最终的
,以将结果减少到单个结果,并在过程中删除临时标签

sum(
  (group by(tenant, environment, deployment) (up{environment=~"dev|uat|prod"} ) -1) 

  or label_replace((sum (ALERTS{severity="warning"} ) by (tenant, environment, deployment)), "severity", "warning", "", "") or 

  2 * label_replace((sum (ALERTS{severity="critical"} )  by (tenant, environment, deployment)), "severity", "critical", "", "")
) by (tenant, environment, deployment)
sum(
  (group by(tenant, environment, deployment) (up{environment=~"dev|uat|prod"} ) -1) 

  or label_replace((sum (ALERTS{severity="warning"} ) by (tenant, environment, deployment)), "severity", "warning", "", "") or 

  2 * label_replace((sum (ALERTS{severity="critical"} )  by (tenant, environment, deployment)), "severity", "critical", "", "")
) by (tenant, environment, deployment)