Python 如何在不删除值的情况下按日期范围重新索引数据帧
背景: 我使用pyodbc下载了以下数据框,日期为1999年至2015年:Python 如何在不删除值的情况下按日期范围重新索引数据帧,python,python-2.7,pandas,Python,Python 2.7,Pandas,背景: 我使用pyodbc下载了以下数据框,日期为1999年至2015年: CEISales.head(10) Out[194]: Order_DateC RegionC SalesC 0 2014-01-30 Domestic 3530.00 1 2011-10-11 Domestic 136.00 2 1999-01-13 Domestic 30.00 3 1999-01-13 Domestic 55615.00 4 1999
CEISales.head(10)
Out[194]:
Order_DateC RegionC SalesC
0 2014-01-30 Domestic 3530.00
1 2011-10-11 Domestic 136.00
2 1999-01-13 Domestic 30.00
3 1999-01-13 Domestic 55615.00
4 1999-01-13 Domestic 440.00
5 1999-01-13 Domestic 94.00
6 1999-01-05 Domestic 612.00
7 1999-01-14 Domestic 1067.00
8 1999-01-14 Domestic 26345.05
9 1999-01-15 Domestic 161858.72
然后,我过滤了所有大于2010-01-01的日期的数据,并按升序日期排序:
CEIFilter = CEISales[CEISales['Order_DateC'] > '2010-01-01']
CEITest = CEIFilter.sort('Order_DateC')
CEITest.head(5)
Out[199]:
Order_DateC RegionC SalesC
18156 2010-01-04 Foreign 450.0
18155 2010-01-04 Domestic 1990.4
18154 2010-01-04 Domestic 37477.0
18152 2010-01-04 Domestic 0.0
18153 2010-01-04 Domestic 783.0
然后,我使用pandas的date_range函数创建了一个日期索引,其值介于2010-01-01和今天之间:
date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')
并重新索引数据帧
CEIFinal= CEITest.reindex(date_index)
我的问题是,当我重新索引数据帧时,所有数据都被删除:
CEIFinal.head(5)
Out[206]:
Order_DateC RegionC SalesC
2010-01-01 NaT NaN NaN
2010-01-02 NaT NaN NaN
2010-01-03 NaT NaN NaN
2010-01-04 NaT NaN NaN
2010-01-05 NaT NaN NaN
从原始过滤数据框中,您可以看到2010-04-01上存在交易
CEITest[CEITest['Order_DateC'] == '2010-01-04']
Out[210]:
Order_DateC RegionC SalesC
18156 2010-01-04 Foreign 450.0
18155 2010-01-04 Domestic 1990.4
18154 2010-01-04 Domestic 37477.0
18152 2010-01-04 Domestic 0.0
18153 2010-01-04 Domestic 783.0
问题
如何使用此日期范围重新索引此数据框并保留所有原始值?我试图在来自不同数据库的多个不同数据帧上创建一个公共索引,将它们添加到一个聚合数据帧中。非常感谢你的帮助。谢谢 当索引不是DatetimeIndex时,您正在通过DatetimeIndex进行索引:
Order_DateC RegionC SalesC
18156 2010-01-04 Foreign 450.0
18155 2010-01-04 Domestic 1990.4
18154 2010-01-04 Domestic 37477.0
18152 2010-01-04 Domestic 0.0
18153 2010-01-04 Domestic 783.0
因此,NaNs和NaTs
也许您想使索引成为Order\u DateC
df = df.set_index("Order_DateC")
然后去
如果重新编制索引,将丢失日期重复的行。我认为在重新编制索引之前,您需要从列
Order\u DateC
设置索引:
CEITest = CEITest.set_index('Order_DateC')
最后,您可以通过以下方式检查notnull
值:
总而言之:
print CEISales
Order_DateC RegionC SalesC
0 2014-01-30 Domestic 3530.00
1 2011-10-11 Domestic 136.00
2 1999-01-13 Domestic 30.00
3 1999-01-13 Domestic 55615.00
4 1999-01-13 Domestic 440.00
5 1999-01-13 Domestic 94.00
6 1999-01-05 Domestic 612.00
7 1999-01-14 Domestic 1067.00
8 1999-01-14 Domestic 26345.05
9 1999-01-15 Domestic 161858.72
CEIFilter = CEISales[CEISales['Order_DateC'] > '2010-01-01']
CEITest = CEIFilter.sort_values('Order_DateC')
print CEITest
Order_DateC RegionC SalesC
1 2011-10-11 Domestic 136
0 2014-01-30 Domestic 3530
#set index to datetimeindex
CEITest = CEITest.set_index('Order_DateC')
print CEITest
RegionC SalesC
Order_DateC
2011-10-11 Domestic 136
2014-01-30 Domestic 3530
date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')
可以有许多Nat
和NaN
,检查数据:
print CEIFinal[CEIFinal.notnull().any(axis=1)]
RegionC SalesC
2011-10-11 Domestic 136
2014-01-30 Domestic 3530
最后可以设置索引名和索引-列名为索引名:
CEIFinal.index.name = 'CEIFinal'
CEIFinal = CEIFinal.reset_index()
print CEIFinal.head()
CEIFinal RegionC SalesC
0 2010-01-01 NaN NaN
1 2010-01-02 NaN NaN
2 2010-01-03 NaN NaN
3 2010-01-04 NaN NaN
4 2010-01-05 NaN NaN
我会先对日期索引或CEITest重新采样吗?你能给我举一个如何对这些数据帧进行重采样的例子吗?谢谢你的帮助,安迪@Andrew这不是
日期索引
。一旦您有了DatetimeIndex,您就可以进行df.重采样(“d”,how=“sum”)
或类似操作。查看如何单独重新采样。它类似于groupby,您还可以执行df.groupby([pd.TimeGrouper(“d”),“RegionC”]).sum()
等操作。感谢您的回复jezrael。我尝试了你的代码并打印了CEIFinal.head()返回了一个空的数据框。嗯,也许你可以检查这两个索引:print CEITest.index
和print CEIFinal.index
(重置索引前)Ir返回此示例:DatetimeIndex(['2011-10-11','2014-01-30'],dtype='datetime64[ns]',name=u'Order\u DateC',freq=None)日期时间索引(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08', '2010-01-09', '2010-01-10', ... '2015-12-14', '2015-12-15', '2015-12-16', '2015-12-17', '2015-12-18', '2015-12-19', '2015-12-20', '2015-12-21', '2015-12-22','2015-12-23'],数据类型为'datetime64[ns]',长度为2183,频率为'D'
print CEIFinal[CEIFinal.notnull().any(axis=1)]
RegionC SalesC
2011-10-11 Domestic 136
2014-01-30 Domestic 3530
CEIFinal.index.name = 'CEIFinal'
CEIFinal = CEIFinal.reset_index()
print CEIFinal.head()
CEIFinal RegionC SalesC
0 2010-01-01 NaN NaN
1 2010-01-02 NaN NaN
2 2010-01-03 NaN NaN
3 2010-01-04 NaN NaN
4 2010-01-05 NaN NaN