Matplotlib 百分位分布图

Matplotlib 百分位分布图,matplotlib,seaborn,Matplotlib,Seaborn,有人知道如何更改X轴比例和刻度以显示下图所示的百分比分布吗?这张图片来自MATLAB,但我想使用Python(通过Matplotlib或Seaborn)生成 99%“> 从@paulh的指针来看,我现在更接近了。这个代码 import matplotlib matplotlib.use('Agg') import numpy as np import matplotlib.pyplot as plt import probscale import seaborn as sns clear_b

有人知道如何更改X轴比例和刻度以显示下图所示的百分比分布吗?这张图片来自MATLAB,但我想使用Python(通过Matplotlib或Seaborn)生成

99%“>

从@paulh的指针来看,我现在更接近了。这个代码

import matplotlib
matplotlib.use('Agg')

import numpy as np
import matplotlib.pyplot as plt
import probscale
import seaborn as sns

clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
sns.set(style='ticks', context='notebook', palette="muted", rc=clear_bkgd)

fig, ax = plt.subplots(figsize=(8, 4))

x = [30, 60, 80, 90, 95, 97, 98, 98.5, 98.9, 99.1, 99.2, 99.3, 99.4]
y = np.arange(0, 12.1, 1)

ax.set_xlim(40, 99.5)
ax.set_xscale('prob')

ax.plot(x, y)
sns.despine(fig=fig)
生成以下绘图(请注意重新分布的X轴)

我发现这比标准量表更有用:

我联系了原始图的作者,他们给了我一些指针。它实际上是一个对数比例图,x轴反转,值为[100 val],手动标记x轴刻度。下面的代码使用与其他图相同的样本数据重新创建了原始图像

import matplotlib
matplotlib.use('Agg')

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
sns.set(style='ticks', context='notebook', palette="muted", rc=clear_bkgd)

x = [30, 60, 80, 90, 95, 97, 98, 98.5, 98.9, 99.1, 99.2, 99.3, 99.4]
y = np.arange(0, 12.1, 1)

# Number of intervals to display.
# Later calculations add 2 to this number to pad it to align with the reversed axis
num_intervals = 3
x_values = 1.0 - 1.0/10**np.arange(0,num_intervals+2)

# Start with hard-coded lengths for 0,90,99
# Rest of array generated to display correct number of decimal places as precision increases
lengths = [1,2,2] + [int(v)+1 for v in list(np.arange(3,num_intervals+2))]

# Build the label string by trimming on the calculated lengths and appending %
labels = [str(100*v)[0:l] + "%" for v,l in zip(x_values, lengths)]


fig, ax = plt.subplots(figsize=(8, 4))

ax.set_xscale('log')
plt.gca().invert_xaxis()
# Labels have to be reversed because axis is reversed
ax.xaxis.set_ticklabels( labels[::-1] )

ax.plot([100.0 - v for v in x], y)

ax.grid(True, linewidth=0.5, zorder=5)
ax.grid(True, which='minor', linewidth=0.5, linestyle=':')

sns.despine(fig=fig)

plt.savefig("test.png", dpi=300, format='png')
这是结果图:

这些类型的图在低延迟社区中很流行,用于绘制延迟分布。在处理延迟时,大多数有趣的信息往往位于较高的百分位,因此对数视图的效果更好。我第一次在和中看到这些图

引用的图表由以下代码生成

n = ceil(log10(length(values)));          
p = 1 - 1./10.^(0:0.01:n);
percentiles = prctile(values, p * 100);
semilogx(1./(1-p), percentiles);
x轴标有下面的代码

labels = cell(n+1, 1);
for i = 1:n+1
  labels{i} = getPercentileLabel(i-1);
end
set(gca, 'XTick', 10.^(0:n));
set(gca, 'XTickLabel', labels);

% {'0%' '90%' '99%' '99.9%' '99.99%' '99.999%' '99.999%' '99.9999%'}
function label = getPercentileLabel(i)
    switch(i)
        case 0
            label = '0%';
        case 1
            label = '90%';
        case 2
            label = '99%';
        otherwise
            label = '99.';
            for k = 1:i-2
                label = [label '9'];
            end
            label = [label '%'];
    end
end

下面的Python代码用于读取包含记录的延迟值列表(以毫秒为单位)的csv文件,然后在中记录这些延迟值(以微秒为单位),并将HdrHistogram保存到一个文件中,然后将该文件用于延迟分布图

import pandas as pd
from hdrh.histogram import HdrHistogram
from hdrh.dump import dump
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import sys
import argparse

# Parse the command line arguments.

parser = argparse.ArgumentParser()
parser.add_argument('csv_file')
parser.add_argument('hgrm_file')
parser.add_argument('png_file')
args = parser.parse_args()

csv_file = args.csv_file
hgrm_file = args.hgrm_file
png_file = args.png_file

# Read the csv file into a Pandas data frame and generate an hgrm file.

csv_df = pd.read_csv(csv_file, index_col=False)

USECS_PER_SEC=1000000
MIN_LATENCY_USECS = 1
MAX_LATENCY_USECS = 24 * 60 * 60 * USECS_PER_SEC # 24 hours
# MAX_LATENCY_USECS = int(csv_df['response-time'].max()) * USECS_PER_SEC # 1 hour
LATENCY_SIGNIFICANT_DIGITS = 5
histogram = HdrHistogram(MIN_LATENCY_USECS, MAX_LATENCY_USECS, LATENCY_SIGNIFICANT_DIGITS)
for latency_sec in csv_df['response-time'].tolist():
    histogram.record_value(latency_sec*USECS_PER_SEC)
    # histogram.record_corrected_value(latency_sec*USECS_PER_SEC, 10)
TICKS_PER_HALF_DISTANCE=5
histogram.output_percentile_distribution(open(hgrm_file, 'wb'), USECS_PER_SEC, TICKS_PER_HALF_DISTANCE)

# Read the generated hgrm file into a Pandas data frame.

hgrm_df = pd.read_csv(hgrm_file, comment='#', skip_blank_lines=True, sep=r"\s+", engine='python', header=0, names=['Latency', 'Percentile'], usecols=[0, 3])

# Plot the latency distribution using Seaborn and save it as a png file.

sns.set_theme()
sns.set_style("dark")
sns.set_context("paper")
sns.set_color_codes("pastel")

fig, ax = plt.subplots(1,1,figsize=(20,15))
fig.suptitle('Latency Results')

sns.lineplot(x='Percentile', y='Latency', data=hgrm_df, ax=ax)
ax.set_title('Latency Distribution')
ax.set_xlabel('Percentile (%)')
ax.set_ylabel('Latency (seconds)')
ax.set_xscale('log')
ax.set_xticks([1, 10, 100, 1000, 10000, 100000, 1000000, 10000000])
ax.set_xticklabels(['0', '90', '99', '99.9', '99.99', '99.999', '99.9999', '99.99999'])

fig.tight_layout()
fig.savefig(png_file)

你自己写过代码或付出过努力吗?如果是的话,请发到这里。我不理解为什么这个问题被封闭得太广。虽然它缺乏一个好的问题描述,但问题本身从图表上看是显而易见的。如果有一种方法生成这种图表,它肯定只需要两行代码,因此答案既不会太长,也不会期望有太多可能的答案。@Chris Osterwood请提供生成此类图形的matlab命令,并以文本形式提供清晰的问题描述,而不仅仅是发布图片。您可以通过将其作为注释发布,例如有经验的用户可以将它们合并到问题中。我想你想在我的库中使用:@PaulH-非常感谢!我使用mpl probscale编辑了我的问题,它与我想要的更接近。Florian-感谢发布MATLAB代码,我相信这对将来的其他人会有用。我同意这一点对于具有“高尾”分布的数据,这种尺度更容易理解。