Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在不删除值的情况下按日期范围重新索引数据帧_Python_Python 2.7_Pandas - Fatal编程技术网

Python 如何在不删除值的情况下按日期范围重新索引数据帧

Python 如何在不删除值的情况下按日期范围重新索引数据帧,python,python-2.7,pandas,Python,Python 2.7,Pandas,背景: 我使用pyodbc下载了以下数据框,日期为1999年至2015年: CEISales.head(10) Out[194]: Order_DateC RegionC SalesC 0 2014-01-30 Domestic 3530.00 1 2011-10-11 Domestic 136.00 2 1999-01-13 Domestic 30.00 3 1999-01-13 Domestic 55615.00 4 1999

背景:

我使用pyodbc下载了以下数据框,日期为1999年至2015年:

CEISales.head(10)
Out[194]: 
   Order_DateC   RegionC     SalesC
0  2014-01-30  Domestic    3530.00
1  2011-10-11  Domestic     136.00
2  1999-01-13  Domestic      30.00
3  1999-01-13  Domestic   55615.00
4  1999-01-13  Domestic     440.00
5  1999-01-13  Domestic      94.00
6  1999-01-05  Domestic     612.00
7  1999-01-14  Domestic    1067.00
8  1999-01-14  Domestic   26345.05
9  1999-01-15  Domestic  161858.72
然后,我过滤了所有大于2010-01-01的日期的数据,并按升序日期排序:

CEIFilter = CEISales[CEISales['Order_DateC'] > '2010-01-01']

CEITest = CEIFilter.sort('Order_DateC')

CEITest.head(5)
Out[199]: 
      Order_DateC   RegionC   SalesC
18156  2010-01-04   Foreign    450.0
18155  2010-01-04  Domestic   1990.4
18154  2010-01-04  Domestic  37477.0
18152  2010-01-04  Domestic      0.0
18153  2010-01-04  Domestic    783.0
然后,我使用pandas的date_range函数创建了一个日期索引,其值介于2010-01-01和今天之间:

date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')
并重新索引数据帧

CEIFinal= CEITest.reindex(date_index)
我的问题是,当我重新索引数据帧时,所有数据都被删除:

CEIFinal.head(5)
Out[206]: 
            Order_DateC RegionC  SalesC
2010-01-01         NaT     NaN     NaN
2010-01-02         NaT     NaN     NaN
2010-01-03         NaT     NaN     NaN
2010-01-04         NaT     NaN     NaN
2010-01-05         NaT     NaN     NaN
从原始过滤数据框中,您可以看到2010-04-01上存在交易

CEITest[CEITest['Order_DateC'] == '2010-01-04']
Out[210]: 
      Order_DateC   RegionC   SalesC
18156  2010-01-04   Foreign    450.0
18155  2010-01-04  Domestic   1990.4
18154  2010-01-04  Domestic  37477.0
18152  2010-01-04  Domestic      0.0
18153  2010-01-04  Domestic    783.0
问题
如何使用此日期范围重新索引此数据框并保留所有原始值?我试图在来自不同数据库的多个不同数据帧上创建一个公共索引,将它们添加到一个聚合数据帧中。非常感谢你的帮助。谢谢

当索引不是DatetimeIndex时,您正在通过DatetimeIndex进行索引:

      Order_DateC   RegionC   SalesC
18156  2010-01-04   Foreign    450.0
18155  2010-01-04  Domestic   1990.4
18154  2010-01-04  Domestic  37477.0
18152  2010-01-04  Domestic      0.0
18153  2010-01-04  Domestic    783.0
因此,NaNs和NaTs

也许您想使索引成为
Order\u DateC

df = df.set_index("Order_DateC")
然后去


如果重新编制索引,将丢失日期重复的行。

我认为在重新编制索引之前,您需要从列
Order\u DateC
设置索引:

CEITest = CEITest.set_index('Order_DateC')
最后,您可以通过以下方式检查
notnull
值:

总而言之:

print CEISales
  Order_DateC   RegionC     SalesC
0  2014-01-30  Domestic    3530.00
1  2011-10-11  Domestic     136.00
2  1999-01-13  Domestic      30.00
3  1999-01-13  Domestic   55615.00
4  1999-01-13  Domestic     440.00
5  1999-01-13  Domestic      94.00
6  1999-01-05  Domestic     612.00
7  1999-01-14  Domestic    1067.00
8  1999-01-14  Domestic   26345.05
9  1999-01-15  Domestic  161858.72

CEIFilter = CEISales[CEISales['Order_DateC'] > '2010-01-01']
CEITest = CEIFilter.sort_values('Order_DateC')
print CEITest
  Order_DateC   RegionC  SalesC
1  2011-10-11  Domestic     136
0  2014-01-30  Domestic    3530

#set index to datetimeindex
CEITest = CEITest.set_index('Order_DateC')
print CEITest
              RegionC  SalesC
Order_DateC                  
2011-10-11   Domestic     136
2014-01-30   Domestic    3530

date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')
可以有许多
Nat
NaN
,检查数据:

print CEIFinal[CEIFinal.notnull().any(axis=1)]
             RegionC  SalesC
2011-10-11  Domestic     136
2014-01-30  Domestic    3530
最后可以设置索引名和索引-列名为索引名:

CEIFinal.index.name = 'CEIFinal'
CEIFinal = CEIFinal.reset_index()
print CEIFinal.head()
   CEIFinal RegionC  SalesC
0 2010-01-01     NaN     NaN
1 2010-01-02     NaN     NaN
2 2010-01-03     NaN     NaN
3 2010-01-04     NaN     NaN
4 2010-01-05     NaN     NaN

我会先对日期索引或CEITest重新采样吗?你能给我举一个如何对这些数据帧进行重采样的例子吗?谢谢你的帮助,安迪@Andrew这不是
日期索引
。一旦您有了DatetimeIndex,您就可以进行
df.重采样(“d”,how=“sum”)
或类似操作。查看如何单独重新采样。它类似于groupby,您还可以执行
df.groupby([pd.TimeGrouper(“d”),“RegionC”]).sum()
等操作。感谢您的回复jezrael。我尝试了你的代码并打印了CEIFinal.head()返回了一个空的数据框。嗯,也许你可以检查这两个索引:
print CEITest.index
print CEIFinal.index
(重置索引前)Ir返回此示例:
DatetimeIndex(['2011-10-11','2014-01-30'],dtype='datetime64[ns]',name=u'Order\u DateC',freq=None)日期时间索引(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',                '2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',                '2010-01-09', '2010-01-10',                ...                '2015-12-14', '2015-12-15', '2015-12-16', '2015-12-17',                '2015-12-18', '2015-12-19', '2015-12-20', '2015-12-21',                '2015-12-22','2015-12-23'],数据类型为'datetime64[ns]',长度为2183,频率为'D'
print CEIFinal[CEIFinal.notnull().any(axis=1)]
             RegionC  SalesC
2011-10-11  Domestic     136
2014-01-30  Domestic    3530
CEIFinal.index.name = 'CEIFinal'
CEIFinal = CEIFinal.reset_index()
print CEIFinal.head()
   CEIFinal RegionC  SalesC
0 2010-01-01     NaN     NaN
1 2010-01-02     NaN     NaN
2 2010-01-03     NaN     NaN
3 2010-01-04     NaN     NaN
4 2010-01-05     NaN     NaN