Python:Matplotlib-多个数据集的概率图_Python_Numpy_Matplotlib_Probability_Percentile

Python:Matplotlib-多个数据集的概率图

python numpy matplotlib

Python:Matplotlib-多个数据集的概率图,python,numpy,matplotlib,probability,percentile,Python,Numpy,Matplotlib,Probability,Percentile,我有以下几个数据集（分布）： set1 = [1,2,3,4,5] set2 = [3,4,5,6,7] set3 = [1,3,4,5,8] 如何使用上述数据集绘制散点图，y轴为概率（即集合中分布的百分位数：0%-100%），x轴为数据集名称？在JMP中，它被称为“分位数图” 类似于图片附件：请教育。谢谢 [编辑] 我的数据是csv格式的，因此：使用JMP分析工具，我能够绘制概率分布图（QQ图/正态分位数图，如下图所示）：我相信Joe Kington几乎解决了我的问题，但我想知

我有以下几个数据集（分布）：

set1 = [1,2,3,4,5]
set2 = [3,4,5,6,7]
set3 = [1,3,4,5,8]

如何使用上述数据集绘制散点图，y轴为概率（即集合中分布的百分位数：0%-100%），x轴为数据集名称？在JMP中，它被称为“分位数图”

类似于图片附件：

请教育。谢谢

[编辑]

我的数据是csv格式的，因此：

使用JMP分析工具，我能够绘制概率分布图（QQ图/正态分位数图，如下图所示）：

我相信Joe Kington几乎解决了我的问题，但我想知道如何将原始csv数据处理成概率或百分比数组

我这样做是为了在Python中自动进行一些统计分析，而不是依赖JMP进行绘图。

我不完全清楚您想要什么，所以我想在这里猜一下

您希望“概率/百分位”值是一个累积直方图吗

那么对于一个情节，你会有这样的东西？（如上图所示，使用标记打印，而不是更传统的步进打印…）

如果这大致上就是你想要的一幅图，那么在一幅图上绘制多幅图有多种方法。最简单的方法就是使用子图

在这里，我们将生成一些数据集，并用不同的符号将它们绘制在不同的子图上

import itertools
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt

# Generate some data... (Using a list to hold it so that the datasets don't 
# have to be the same length...)
numdatasets = 4
stds = np.random.randint(1, 10, size=numdatasets)
means = np.random.randint(-5, 5, size=numdatasets)
values = [std * np.random.randn(100) + mean for std, mean in zip(stds, means)]

# Set up several subplots
fig, axes = plt.subplots(nrows=1, ncols=numdatasets, figsize=(12,6))

# Set up some colors and markers to cycle through...
colors = itertools.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k'])
markers = itertools.cycle(['o', '^', 's', r'$\Phi$', 'h'])

# Now let's actually plot our data...
for ax, data, color, marker in zip(axes, values, colors, markers):
    counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
    x = np.arange(counts.size) * dx + start
    ax.plot(x, counts, color=color, marker=marker, 
            markersize=10, linestyle='none')

# Next we'll set the various labels...
axes[0].set_ylabel('Cumulative Frequency')
labels = ['This', 'That', 'The Other', 'And Another']
for ax, label in zip(axes, labels):
    ax.set_xlabel(label)

plt.show()

如果我们想让它看起来像一个连续的图，我们可以把子图挤在一起，关闭一些边界。在调用

plt.show（）

希望这能有所帮助，无论如何

编辑：如果你想要百分位值，而不是累积直方图（我真的不应该用100作为样本大小！），这很容易做到

只需这样做（使用

numpy.percentile

而不是手动进行规范化）：

如果您准确描述了如何将数据集转换为要绘制的数据集，那么将更容易帮助您完成。很好！顺便问一下，你有没有考虑过把其中一些送到美术馆去？我发现在matplotlib中找到如何做某事的最快方法的一半时间是浏览图库中类似的内容。@Joe:累积频率与百分位数相同吗？我需要检查一下。你几乎解决了我的问题，我正在到处调整来处理数据表。@siva-不，他们不是。我不应该用100作为样本量！这使它很容易误导人！（很抱歉！）但是，将累积频率值表示为百分位数相当简单。您只需要根据数据集中的样本数进行规范化。@Joe:您的n=100的示例非常有用。在matplotlib上学习了一些基础知识。谢谢。另外，您将如何规范化数据集？你能给我看看吗？我是否必须逐个找到0-100之间的百分位数，并根据数据的最小值和最大值范围绘制它？@siva-请参见底部的编辑。希望这更清楚一点！

import itertools
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt

# Generate some data... (Using a list to hold it so that the datasets don't 
# have to be the same length...)
numdatasets = 4
stds = np.random.randint(1, 10, size=numdatasets)
means = np.random.randint(-5, 5, size=numdatasets)
values = [std * np.random.randn(100) + mean for std, mean in zip(stds, means)]

# Set up several subplots
fig, axes = plt.subplots(nrows=1, ncols=numdatasets, figsize=(12,6))

# Set up some colors and markers to cycle through...
colors = itertools.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k'])
markers = itertools.cycle(['o', '^', 's', r'$\Phi$', 'h'])

# Now let's actually plot our data...
for ax, data, color, marker in zip(axes, values, colors, markers):
    counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
    x = np.arange(counts.size) * dx + start
    ax.plot(x, counts, color=color, marker=marker, 
            markersize=10, linestyle='none')

# Next we'll set the various labels...
axes[0].set_ylabel('Cumulative Frequency')
labels = ['This', 'That', 'The Other', 'And Another']
for ax, label in zip(axes, labels):
    ax.set_xlabel(label)

plt.show()

# Because we want this to look like a continuous plot, we need to hide the
# boundaries (a.k.a. "spines") and yticks on most of the subplots
for ax in axes[1:]:
    ax.spines['left'].set_color('none')
    ax.spines['right'].set_color('none')
    ax.yaxis.set_ticks([])
axes[0].spines['right'].set_color('none')

# To reduce clutter, let's leave off the first and last x-ticks.
for ax in axes:
    xticks = ax.get_xticks()
    ax.set_xticks(xticks[1:-1])

# Now, we'll "scrunch" all of the subplots together, so that they look like one
fig.subplots_adjust(wspace=0)

# Replacing the for loop from before...
plot_percentiles = range(0, 110, 10)
for ax, data, color, marker in zip(axes, values, colors, markers):
    x = np.percentile(data, plot_percentiles)
    ax.plot(x, plot_percentiles, color=color, marker=marker, 
            markersize=10, linestyle='none')