Python 按日期分组，绘制分类分布图_Python_Pandas_Matplotlib_Seaborn

Python 按日期分组，绘制分类分布图

python pandas matplotlib

Python 按日期分组，绘制分类分布图,python,pandas,matplotlib,seaborn,Python,Pandas,Matplotlib,Seaborn,我试图绘制按特定日期范围分类的数据例如，假设我有以下数据帧： dates = pd.date_range(start=pd.datetime(2013, 6, 1), periods=50, freq='D') df = pd.DataFrame(np.random.normal(10, 3, 50), columns=['x'], index=dates) df[:3] x 2013-06-01 9.819422 2013-06-02 3.659629 2013-

我试图绘制按特定日期范围分类的数据

例如，假设我有以下数据帧：

dates = pd.date_range(start=pd.datetime(2013, 6, 1), periods=50, freq='D')
df = pd.DataFrame(np.random.normal(10, 3, 50), columns=['x'], index=dates)
df[:3]
            x
2013-06-01  9.819422
2013-06-02  3.659629
2013-06-03  14.862231

我想按3周的间隔对日期进行分组，并绘制数据图，这将给出我要寻找的平均值

df.resample('3w', how='mean')

            x
2013-06-02  11.424715
2013-06-23  9.443888
2013-07-14  8.572851
2013-08-04  9.873879

但我想保留所有数据，以便使用

seaborn

中的箱线图，或使用

matplotlib

包含标准错误。我完全被困在如何在不明确定义范围的情况下实现这一点上（这在我使用的实际数据帧中是不可能的）。在熊猫中，似乎必须有一种相当简单的方法来实现这一点，因此输出将类似于：

            x           week
2013-06-01  9.819422    1
2013-06-02  3.659629    1
2013-06-03  14.862231   1

其中，

week

是表示装箱数据的分类变量。如有任何想法，将不胜感激

也许您可以使用TimeGrouper

df.groupby(pd.TimeGrouper('3w', how=np.mean)).describe().unstack()
               x                                                                          
           count       mean       std       min       25%        50%        75%        max
2013-06-02     2  10.864835  3.794379  8.181803  9.523319  10.864835  12.206350  13.547866
2013-06-23    21   9.888556  3.452331  3.503944  7.838625   9.739525  12.403285  16.031644
2013-07-14    21  10.475142  2.687320  6.605619  8.399518  11.209683  11.818895  16.265771
2013-08-04     6   9.471931  3.196345  5.492205  8.122607   8.502217  10.901065  14.638198

>>> g = df.groupby(pd.TimeGrouper('3w', how=np.mean)).boxplot()

要将期间开始日期（作为字符串）添加到原始数据，请执行以下操作：

df = pd.DataFrame(np.random.normal(10, 3, 50), columns=['x'], index=dates)
tg = df.groupby(pd.TimeGrouper('3W', closed='left'))
df['period'] = None
for p, idx in tg.indices.iteritems():
    df.ix[idx, 'period'] = p.strftime('%Y-%m-%d')

>>> df.head()
                    x      period
2013-06-01   7.972202  2013-06-16
2013-06-02  12.184312  2013-06-16
2013-06-03   6.884374  2013-06-16
2013-06-04   8.414091  2013-06-16
2013-06-05  12.368407  2013-06-16

以下是我的做法：

for idx,w in enumerate(df.groupby(pd.TimeGrouper("3w-SAT"))): # your first day is a saturday
    df.loc[w[0], "week"] = idx+1

# propagate the week number
df["week"] = df.week.fillna(method="ffill") 

# remove added date by the Timegrouper as your number of date is not a multiple of 3 weeks.
df.dropna(inplace=1) 
df.tail()

                    x  week
2013-07-16  15.717111     3
2013-07-17   9.815201     3
2013-07-18   9.426426     3
2013-07-19  12.725350     3
2013-07-20  16.100748     3


# just use seaborn as usual
sns.boxplot(data=df, x="week", y="x") # plot it

我不知道是否有更好的方法直接将TimeGrouper与seaborn一起使用

HTH

这在大多数情况下都很有效，但是，我真的希望能够生成作为示例提供的数据帧，因为我还希望能够计算其他统计数据。尝试

tg=df.groupby（pd.TimeGrouper（'3w'））

并使用

tg.

和制表符完成来查看可用的方法。注意

。获取组以及所有其他可用的统计信息。谢谢，这是一个很好的建议。在完整数据集上运行此操作时，我会出现错误，我认为这是因为存在重复的时间索引，但我不是肯定的，我需要进一步调查。@johnchase，尝试减少数据集，并共享它，以便我们可以重现错误。（或制作另一个复制错误的假文件）