Python 当我尝试在带有datetimeindex的pandas数据帧中删除一行时，它会移动索引_Python_Pandas_Datetimeindex

Python 当我尝试在带有datetimeindex的pandas数据帧中删除一行时，它会移动索引

python pandas

Python 当我尝试在带有datetimeindex的pandas数据帧中删除一行时，它会移动索引,python,pandas,datetimeindex,Python,Pandas,Datetimeindex,我有一个带有datetimeindex的数据帧。当我尝试按索引值删除一行时，行数会正确地变为N-1，但索引中的时间会移动。事实上，一大块行从一开始就被切碎，然后一大块具有Nan值的行被添加到末尾。这个“块”的大小似乎是我的时区偏移量，单位是小时*我每小时的频率。以下是一个可复制的示例： Python 2.7.8 |Anaconda 2.1.0 (x86_64)| (default, Aug 21 2014, 15:21:46) [GCC 4.2.1 (Apple Inc. build 5577

我有一个带有datetimeindex的数据帧。当我尝试按索引值删除一行时，行数会正确地变为N-1，但索引中的时间会移动。事实上，一大块行从一开始就被切碎，然后一大块具有Nan值的行被添加到末尾。这个“块”的大小似乎是我的时区偏移量，单位是小时*我每小时的频率。以下是一个可复制的示例：

Python 2.7.8 |Anaconda 2.1.0 (x86_64)| (default, Aug 21 2014, 15:21:46) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
In[2]: import pandas
In[3]: from pytz import timezone
In[4]: from pandas import Timestamp

In[5]: print pandas.__version__
0.14.0

In[6]: dti = pandas.DatetimeIndex(start='2014-11-09 00:00:00', freq='15T',periods=2976, tz=timezone('US/Pacific'))

In[7]: df = pandas.DataFrame({'data':range(2976)},index=dti)

In[8]: df.head(5)
Out[8]: 
                           data
2014-11-09 00:00:00-08:00     0
2014-11-09 00:15:00-08:00     1
2014-11-09 00:30:00-08:00     2
2014-11-09 00:45:00-08:00     3
2014-11-09 01:00:00-08:00     4

In[9]: df.drop(Timestamp('2014-11-28 11:30:00-08:00')).head(5)
Out[9]: 
                           data
2014-11-09 08:00:00-08:00    32
2014-11-09 08:15:00-08:00    33
2014-11-09 08:30:00-08:00    34
2014-11-09 08:45:00-08:00    35
2014-11-09 09:00:00-08:00    36

In[10]: df.drop(Timestamp('2014-11-28 11:30:00-08:00')).tail(5)
Out[10]: 
                           data
2014-12-10 06:45:00-08:00   NaN
2014-12-10 07:00:00-08:00   NaN
2014-12-10 07:15:00-08:00   NaN
2014-12-10 07:30:00-08:00   NaN
2014-12-10 07:45:00-08:00   NaN

In[11]: df.index
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-11-09 00:00:00-08:00, ..., 2014-12-09 23:45:00-08:00]
Length: 2976, Freq: 15T, Timezone: US/Pacific

In[12]: df.drop(Timestamp('2014-11-28 11:30:00-08:00')).index 
Out[12]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-11-09 08:00:00-08:00, ..., 2014-12-10 07:45:00-08:00]
Length: 2975, Freq: None, Timezone: US/Pacific

Python 2.7.8 | Anaconda 2.1.0（x86_64）|（默认，2014年8月21日，15:21:46）
[GCC 4.2.1（Apple Inc.build 5577）]关于达尔文
在[2]中：输入大熊猫
在[3]中：从pytz导入时区
在[4]中：从导入时间戳
在[5]中：打印熊猫__
0.14.0
在[6]中：dti=pandas.DatetimeIndex（start='2014-11-09 00:00:00'，freq='15T'，periods=2976，tz=timezone（'US/Pacific'））
在[7]中：df=pandas.DataFrame（{'data'：range（2976）}，index=dti）
In[8]：测向头（5）
出[8]：
数据
2014-11-09 00:00:00-08:00     0
2014-11-09 00:15:00-08:00     1
2014-11-09 00:30:00-08:00     2
2014-11-09 00:45:00-08:00     3
2014-11-09 01:00:00-08:00     4
[9]中：df.drop（时间戳（'2014-11-28 11:30:00-08:00'））.head（5）
出[9]：
数据
2014-11-09 08:00:00-08:00    32
2014-11-09 08:15:00-08:00    33
2014-11-09 08:30:00-08:00    34
2014-11-09 08:45:00-08:00    35
2014-11-09 09:00:00-08:00    36
在[10]中：df.drop（时间戳（'2014-11-28 11:30:00-08:00'））.tail（5）
出[10]：
数据
2014-12-10 06:45:00-08:00南
2014-12-10 07:00:00-08:00南
2014-12-10 07:15:00-08:00南
2014-12-10 07:30:00-08:00南
2014-12-10 07:45:00-08:00南
In[11]：df.index
出[11]：
[2014-11-09 00:00:00-08:00, ..., 2014-12-09 23:45:00-08:00]
长度：2976，频率：15T，时区：美国/太平洋
在[12]中：df.drop（时间戳（'2014-11-28 11:30:00-08:00'））索引
出[12]：
[2014-11-09 08:00:00-08:00, ..., 2014-12-10 07:45:00-08:00]
长度：2975，频率：无，时区：美国/太平洋

您应该指出您正在使用0.17.0

In [13]: import psycopg2

In [14]: df = DataFrame(np.arange(10),index=pd.date_range('20130101 09:00:00',periods=10,tz=psycopg2.tz.FixedOffsetTimezone(offset=-480, name=None),freq='H'),columns=['value'])

In [15]: df
Out[15]: 
                           value
2013-01-01 09:00:00-08:00      0
2013-01-01 10:00:00-08:00      1
2013-01-01 11:00:00-08:00      2
2013-01-01 12:00:00-08:00      3
2013-01-01 13:00:00-08:00      4
2013-01-01 14:00:00-08:00      5
2013-01-01 15:00:00-08:00      6
2013-01-01 16:00:00-08:00      7
2013-01-01 17:00:00-08:00      8
2013-01-01 18:00:00-08:00      9

In [16]: df.index
Out[16]: 
DatetimeIndex(['2013-01-01 09:00:00-08:00', '2013-01-01 10:00:00-08:00', '2013-01-01 11:00:00-08:00', '2013-01-01 12:00:00-08:00', '2013-01-01 13:00:00-08:00', '2013-01-01 14:00:00-08:00',
               '2013-01-01 15:00:00-08:00', '2013-01-01 16:00:00-08:00', '2013-01-01 17:00:00-08:00', '2013-01-01 18:00:00-08:00'],
              dtype='datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=-480, name=None)]', freq='H')

In [17]: df.drop(Timestamp('2013-01-01 16:00:00',tz=psycopg2.tz.FixedOffsetTimezone(offset=-480, name=None)))
Out[17]: 
                           value
2013-01-01 09:00:00-08:00      0
2013-01-01 10:00:00-08:00      1
2013-01-01 11:00:00-08:00      2
2013-01-01 12:00:00-08:00      3
2013-01-01 13:00:00-08:00      4
2013-01-01 14:00:00-08:00      5
2013-01-01 15:00:00-08:00      6
2013-01-01 17:00:00-08:00      8
2013-01-01 18:00:00-08:00      9

因此，您需要准确地在要删除的元素上指定时区，否则它不在索引中，否则将出现错误：

In [18]: df.drop(Timestamp('2013-01-01 16:00:00'))
ValueError: labels [Timestamp('2013-01-01 16:00:00')] not contained in axis

因此，请提供一个可重复的例子

此外，您可能希望使用

read\u sql\u table

，而不是

read\u sql\u query

（它可以正确读取时区列）

或者，您可能只想转换为“更有用”的时区（例如UTC或类似US/…）

或者干脆放下tz，定位它的位置（我想这是你想要的）

我刚注意到班次是UTC/时区偏移量。。。但该指数的时区变化似乎没有被取消。

In [21]: df.index.tz_convert('UTC') 
Out[21]: 
DatetimeIndex(['2013-01-01 17:00:00+00:00', '2013-01-01 18:00:00+00:00', '2013-01-01 19:00:00+00:00', '2013-01-01 20:00:00+00:00', '2013-01-01 21:00:00+00:00', '2013-01-01 22:00:00+00:00',
               '2013-01-01 23:00:00+00:00', '2013-01-02 00:00:00+00:00', '2013-01-02 01:00:00+00:00', '2013-01-02 02:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='H')

In [27]: df.index.tz_localize(None)
Out[27]: 
DatetimeIndex(['2013-01-01 09:00:00', '2013-01-01 10:00:00', '2013-01-01 11:00:00', '2013-01-01 12:00:00', '2013-01-01 13:00:00', '2013-01-01 14:00:00', '2013-01-01 15:00:00', '2013-01-01 16:00:00',
               '2013-01-01 17:00:00', '2013-01-01 18:00:00'],
              dtype='datetime64[ns]', freq='H')