Prometheus 如何求普罗米修斯速率的分位数

Prometheus 如何求普罗米修斯速率的分位数,prometheus,Prometheus,我正在看这个 我不明白为什么 histogram_quantile(0.9, rate(prometheus_http_request_duration_seconds_bucket[5m]) ) 不使用单位观察事件/秒给出速率的分位数,而是使用单位秒/观察事件给出请求持续时间的分位数 应给出5分钟内特定桶/秒平均观察事件数 我可以想象,分位数会给你速率分位数 我一定是理解错误了此处的速率函数用于指定分位数计算的时间窗口,如中所示。翻译过来就是在过去的5分钟内,我的90%用户所经历的

我正在看这个

我不明白为什么

histogram_quantile(0.9, 
    rate(prometheus_http_request_duration_seconds_bucket[5m])
)
不使用单位观察事件/秒给出速率的分位数,而是使用单位秒/观察事件给出请求持续时间的分位数

应给出5分钟内特定桶/秒平均观察事件数

我可以想象,分位数会给你速率分位数

我一定是理解错误了

此处的速率函数用于指定分位数计算的时间窗口,如中所示。翻译过来就是在过去的5分钟内,我的90%用户所经历的最大http响应时间是多少

直方图_分位数函数通过假设桶内的线性分布来插值分位数值,le给出最大观察时间。桶是一个计数器,用于测量自过程开始以来发生的观察次数。rate通过计算平均每秒发生的观测次数来建立链接,从中可以插值整个时间窗口内的平均响应时间

你是对的,这不是一个100%准确的测量,因为平均值,但函数正在做很多假设,桶的选择已经引入了偏差

我想你可以用irate来计算瞬时分位数,但很可能它会更嘈杂。

这是普罗米修斯的历史分位数代码

举个例子,

assumed the original bucket is :
[50][100][150][200][200] with corresponding upperbound 5s,10s,15s,20s,+Inf.

then the rate(xx[5m]) returned a bucket like this:
[20/5*60][40/5*60][60/5*60][80/5*60][80/5*60]

histogram_quantile will delegate the returned bucket to another function bucketQuantile.
It used the rough following logic to compute the percentile: 

1) get the total rank of the percentile 
such as 90ile is 0.9 * total counts = 0.9 * (80/5*60)
2) compute the value of 90ile
last upperbound before the total rank position is 15 secs;
current upperbound of the total rank is 20 secs;
the count in the bucket that 90ile position belongs is (80/5*60)-(60/5*60);
the internal rank in that single bucket of 90ile position is (0.9 * 80/5*60)-(60/5*60);
finally, the value of 90ile is: 15 sec + (internal rank / that bucket count) * (20sec-15sec) = 15 + 3 * ( (0.9 * 80/5*60)-(60/5*60) / (80/5*60)-(60/5*60) ) = 
15 + 3 * ( (0.9*80 - 60)/(80-60) ) = 15 + 3 * ( 12/20) = 15+3*0.6= 16.8 sec
就是这样,你可以看到分母5*60在计算中实际上没有影响。因此,rate func只是用来指定5分钟的时间窗口

rate(prometheus_http_request_duration_seconds_bucket[5m]
assumed the original bucket is :
[50][100][150][200][200] with corresponding upperbound 5s,10s,15s,20s,+Inf.

then the rate(xx[5m]) returned a bucket like this:
[20/5*60][40/5*60][60/5*60][80/5*60][80/5*60]

histogram_quantile will delegate the returned bucket to another function bucketQuantile.
It used the rough following logic to compute the percentile: 

1) get the total rank of the percentile 
such as 90ile is 0.9 * total counts = 0.9 * (80/5*60)
2) compute the value of 90ile
last upperbound before the total rank position is 15 secs;
current upperbound of the total rank is 20 secs;
the count in the bucket that 90ile position belongs is (80/5*60)-(60/5*60);
the internal rank in that single bucket of 90ile position is (0.9 * 80/5*60)-(60/5*60);
finally, the value of 90ile is: 15 sec + (internal rank / that bucket count) * (20sec-15sec) = 15 + 3 * ( (0.9 * 80/5*60)-(60/5*60) / (80/5*60)-(60/5*60) ) = 
15 + 3 * ( (0.9*80 - 60)/(80-60) ) = 15 + 3 * ( 12/20) = 15+3*0.6= 16.8 sec