Python 分组数据的直方图_Python_Pandas_Matplotlib

Python 分组数据的直方图

python pandas matplotlib

Python 分组数据的直方图,python,pandas,matplotlib,Python,Pandas,Matplotlib,我对Python还相当陌生，要想了解这一点真的很难我有这样的代码 df = p.read_csv("files/athena-query-1.txt", ";") ax = df.hist(column="distance", range=[0.0, 0.5], bins=100, by="gate_id") 我只想在单独的图表上看到每个门的距离分布。如果有400个gate\u id，我希望看到400个分布图它告诉我，ax是AxesSubplot的集合。当我试图绘制这个图时，我只得到一个不

我对Python还相当陌生，要想了解这一点真的很难

我有这样的代码

df = p.read_csv("files/athena-query-1.txt", ";")
ax = df.hist(column="distance", range=[0.0, 0.5], bins=100, by="gate_id")

我只想在单独的图表上看到每个门的距离分布。如果有400个

gate\u id

，我希望看到400个分布图

它告诉我，

ax

是

AxesSubplot

的集合。当我试图绘制这个图时，我只得到一个不可读的图。我的猜测是，它试图创建一个图表（一个数字？）。

编辑：

我复制了一个我认为你可能的意思的小例子：

#create dataframe with 100 random values of normal distribution for 'distance', and distributing (1,2,3,4) as 'gate_id' evenly among the values: 
df=pd.DataFrame({'distance': scipy.stats.norm.rvs(size=100), 'gate_id': 25*[1,2,3,4]})

df.hist(column='distance', range=[0.0, 0.5], bins=100, by='gate_id')

这将生成一个包含4个子批次的图形，对应于“gate_id”：

然而，如果我像你提到的那样尝试400，这个数字甚至没有显示出来。可能是因为它不够大，无法容纳400个子地块。这就是我推荐下面给出的第一个解决方案示例的原因

原件：

如果您想要400个单独的分布图，那么为什么不使用matplotlib创建400个图形呢

from matplotlib import pyplot as plt

for i in range(400):
    fig, ax = plt.subplots()
    ax.plot(<dataframe['x']>,<dataframe['y']>)

从matplotlib导入pyplot作为plt
对于范围（400）内的i：
图，ax=plt.子批次（）
ax.绘图（，）

或者你也可以尝试用许多子图绘制一个巨大的数字，例如

fig, ( (ax1, ax2, ax3, ...<fill up here>..., ax10), (ax11, ..., ax20), ..., (ax91, ..., ax100)) = plt.subplots(nrows=10, ncols=10)

ax1.bar(<dataframe['x']>,<dataframe['y']>)
...
ax100.bar(<dataframe['x']>,<dataframe['y']>)

fig，（（ax1，ax2，ax3，…，ax10），（ax11，…，ax20），…，（ax91，…，ax100））=plt.子批（nrows=10，ncols=10）
ax1.bar（，）
...
ax100.bar（，）

这只适用于100个子批次，不确定400是否太大。

编辑：

我复制了一个我认为你可能的意思的小例子：

#create dataframe with 100 random values of normal distribution for 'distance', and distributing (1,2,3,4) as 'gate_id' evenly among the values: 
df=pd.DataFrame({'distance': scipy.stats.norm.rvs(size=100), 'gate_id': 25*[1,2,3,4]})

df.hist(column='distance', range=[0.0, 0.5], bins=100, by='gate_id')

这将生成一个包含4个子批次的图形，对应于“gate_id”：

原件：

如果您想要400个单独的分布图，那么为什么不使用matplotlib创建400个图形呢

from matplotlib import pyplot as plt

for i in range(400):
    fig, ax = plt.subplots()
    ax.plot(<dataframe['x']>,<dataframe['y']>)

从matplotlib导入pyplot作为plt
对于范围（400）内的i：
图，ax=plt.子批次（）
ax.绘图（，）

或者你也可以尝试用许多子图绘制一个巨大的数字，例如

fig, ( (ax1, ax2, ax3, ...<fill up here>..., ax10), (ax11, ..., ax20), ..., (ax91, ..., ax100)) = plt.subplots(nrows=10, ncols=10)

ax1.bar(<dataframe['x']>,<dataframe['y']>)
...
ax100.bar(<dataframe['x']>,<dataframe['y']>)

fig，（（ax1，ax2，ax3，…，ax10），（ax11，…，ax20），…，（ax91，…，ax100））=plt.子批（nrows=10，ncols=10）
ax1.bar（，）
...
ax100.bar（，）

这仅适用于100个子批次，不确定400是否太大。

在您的第一个代码示例中，这与我的

df.hist

结果如何匹配？我这样问是因为我觉得我对熊猫做了正确的事情，但我现在不知道该如何处理它的结果。在您的示例中，看起来我应该手动创建一个字典

？您是对的，这意味着拥有一个由字典

{'group\u id'：array[distance]}

组成的数据帧。是的，

df.hist

直接从

pandas

生成，而我的第一个代码示例实际上是从数据手动创建子批次。我毫不怀疑，只要在某种程度上使用

df.hist

，也可以很好地完成。在您的第一个代码示例中，这与我的

df.hist

？您是对的，这意味着拥有一个由字典

{'group\u id'：array[distance]}

组成的数据帧。是的，

df.hist

直接从

pandas

生成，而我的第一个代码示例实际上是从数据手动创建子批次。我毫不怀疑，只要在某种程度上使用

df.hist

，就可以很好地实现这一点。