Python 垃圾箱对齐_Python_Pandas_Dataframe_Histogram

Python 垃圾箱对齐

python pandas dataframe

Python 垃圾箱对齐,python,pandas,dataframe,histogram,Python,Pandas,Dataframe,Histogram,我有一个数据框，看起来像这样： train_data_10users = pd.DataFrame({'target':['A','A','B', 'B', 'C'], 'day_of_week':[4,2,4,4,1]}) target day_of_week 0 A 4 1 A 2 2 B 4 3 B 4 4 C 1 我想为每个目标建立一个每周一天的统计直

我有一个数据框，看起来像这样：

train_data_10users = pd.DataFrame({'target':['A','A','B', 'B', 'C'], 'day_of_week':[4,2,4,4,1]})

 target  day_of_week
0   A            4
1   A            2
2   B            4
3   B            4
4   C            1

我想为每个目标建立一个每周一天的统计直方图，即

"A" should have:
0,1,3,5,6:0
2,4:1
"B" should have
0,1,2,3,5,6:0
4:2
"C" should have 1:1, the rest:0

这是透视表，它显示了我希望在直方图上显示的真实数据（注：fillna）：

即使groupby中可能缺少几天，添加适当的XTick也可以实现以下目的：

from matplotlib import pyplot as plt
import pandas as pd

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10))
for idx, (user, sub_df) in enumerate(
        pd.groupby(train_data_10users[["target", "day_of_week"]], 'target')): 
    ax = axes[idx // 4, idx % 4]
    sub_df.hist(ax=ax, label=user, color=color_dic.get(user), bins=7)
    ax.set_xticks(range(7))
    ax.legend()

但是这些值没有完全对齐/居中，而且位置有点浮动，我假设这取决于每个目标的存在/缺失天数：

Upd. 根据公认的答案，以下是它的外观：

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10), sharey=True)
...
sub_df.hist(ax=ax, label=user, color=color_dic.get(user), bins=range(8))
ax.set_xticks(range(8))
ax.set_xticks(np.arange(8)+0.5)
ax.set_xticklabels(range(7))

试试：

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10))
for idx, (user, sub_df) in enumerate(
    pd.groupby(train_data_10users[["target", "day_of_week"]], 'target')): 
    ax = axes[idx // 4, idx % 4]

    # note bin is forced to range(7)
    sub_df.hist(ax=ax, label=user, bins=range(7))

    # offset the xticks
    ax.set_xticks(np.arange(7) + .5)

    # name the label accordingly
    ax.set_xticklabels(range(7))

带

箱的输出=范围（7）

：

什么是

列车数据\u 10用户

？什么是轴？它是我的数据框和子批轴，用于那些试图重新创建数据框的人：

train\u data\u 10users=pd.dataframe（{'target'：['A'，'A'，'B'，'B'，'C'，'day\u of u week'：[4,2,4,4,1]）

那么你期望如何，将那些带有零/NaN的数据框从历史记录中删除？这不是真正的直方图。请查看我编辑的答案，如果它符合您的需要。注意，我还将

bin

更改为

range（7）

。它与

bins=7

有点不同，因为

bins=7

将范围

min-max

除以7个bins，而正式设置为

（0,1,2,3,4,5,6）

是的，我注意到了这一点并删除了我之前的评论。然而，仍然有一个小问题。现在，hist被筛选到了右边，第一天就没有价值了。让我将图片发布为Upd。将偏移量更改为“+0.5”。我尝试过，但效果正好相反，使用+0.5它将不会显示最后一天的值。发现问题，请更改

bins=range（8）

。此外，考虑将<代码> Syyy= Trime<代码> > <代码>子图。

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10))
for idx, (user, sub_df) in enumerate(
    pd.groupby(train_data_10users[["target", "day_of_week"]], 'target')): 
    ax = axes[idx // 4, idx % 4]

    # note bin is forced to range(7)
    sub_df.hist(ax=ax, label=user, bins=range(7))

    # offset the xticks
    ax.set_xticks(np.arange(7) + .5)

    # name the label accordingly
    ax.set_xticklabels(range(7))