Python 按新的日期范围重新索引数据帧_Python_Pandas_Date Range_Reindex

Python 按新的日期范围重新索引数据帧

python pandas

Python 按新的日期范围重新索引数据帧,python,pandas,date-range,reindex,Python,Pandas,Date Range,Reindex,我有一个数据框，其中包含许多观察结果： date colour orders 2014-10-20 red 7 2014-10-21 red 10 2014-10-20 yellow 3 我想重新索引数据框并标准化日期 date colour orders 2014-10-20 red 7 2014-10-21 red 10 2014-10-22 red

我有一个数据框，其中包含许多观察结果：

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-20   yellow     3

我想重新索引数据框并标准化日期

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-22   red        NaN
2014-10-20   yellow     3
2014-10-21   yellow     NaN
2014-10-22   yellow     NaN

我想按

颜色

和

日期

对数据框进行排序，然后尝试重新编制索引

index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts

但它返回一个新的数据帧，该帧具有正确的索引，但包含所有

NaN

值

date         colour     orders
2014-10-20   NaN        NaN
2014-10-21   NaN        NaN
2014-10-22   NaN        NaN

从exampe数据帧开始：

In [51]: df
Out[51]:
        date  colour  orders
0 2014-10-20     red       7
1 2014-10-21     red      10
2 2014-10-20  yellow       3

如果要同时对“日期”和“颜色”重新编制索引，一种可能是将两者都设置为索引（多索引）：

在构造到所需索引后，现在可以重新索引此数据帧：

In [54]: index = pd.date_range('20/10/2014', '22/10/2014')

In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])

In [56]: df.reindex(multi_index)
Out[56]:
                   orders
2014-10-20 red          7
           yellow       3
2014-10-21 red         10
           yellow     NaN
2014-10-22 red        NaN
           yellow     NaN

要获得与示例输出相同的输出，索引应按第二级排序（

level=1

，因为它是基于0的）：

自动生成多索引的一种可能方法是（使用原始帧）：

另一种方法是对每组颜色使用
重采样
：

In [77]: df = df.set_index('date') In [78]: df.groupby('colour').resample('D')

这更简单，但这并不能提供每种颜色的完整日期范围，只能提供该颜色组可用的日期范围。
在您的示例中，什么是
索引
？嗨，Joris，我是熊猫新手。我认为初始数据帧实际上根本没有索引。我对它进行了排序，但没有设置任何索引。但我的意思是，在
ts=test\u df.reindex（index）
行中使用一个名为
index
的变量。那到底是什么呢？对不起，我已经编辑了初始问题，但代码行丢失了。理想情况下，我会让熊猫自动找到开始和结束日期。。比如数据框中的日期越小越大。我刚刚看到命令
test\u df.resample（'D'）
就是要这么做的，但我认为我应该提前按“日期”对test\u df进行索引，我正在努力解决这个问题。比如说，我有数千种产品（公平地说，我也不是只有一列，而是一个类别，一个不同的子类别等等），如何从产品（[index，['red'，yellow']]）更改代码的这一部分
multi\u index=pd.MultiIndex.。？请参阅我的“自动生成多索引的可能方法…”，以便在颜色中有大量值时执行此操作column@Gianluca这解决了你的问题吗？还是还有问题？ In [60]: df2 = df.reindex(multi_index) In [64]: df2.sortlevel(level=1) Out[64]: orders 2014-10-20 red 7 2014-10-21 red 10 2014-10-22 red NaN 2014-10-20 yellow 3 2014-10-21 yellow NaN 2014-10-22 yellow NaN pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'), df['colour'].unique()]) In [77]: df = df.set_index('date') In [78]: df.groupby('colour').resample('D')