Python 更改pandas中日期时间列的时区并添加为层次索引

Python 更改pandas中日期时间列的时区并添加为层次索引,python,timezone,dataframe,pandas,multi-index,Python,Timezone,Dataframe,Pandas,Multi Index,我有UTC时间戳的数据。我想将此时间戳的时区转换为“US/Pacific”,并将其作为分层索引添加到数据帧中。我已经能够将时间戳转换为索引,但当我尝试将其作为列或索引添加回数据帧时,它会丢失时区格式 >>> import pandas as pd >>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-

我有UTC时间戳的数据。我想将此时间戳的时区转换为“US/Pacific”,并将其作为分层索引添加到数据帧中。我已经能够将时间戳转换为索引,但当我尝试将其作为列或索引添加回数据帧时,它会丢失时区格式

>>> import pandas as pd
>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})
>>> dat.dtypes
#datetime    object
#label       object
#value        int64
#dtype: object
现在,如果我试图直接转换序列,我会遇到一个错误

>>> times = pd.to_datetime(dat['datetime'])
>>> times.tz_localize('UTC')
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize
#    raise Exception('Cannot tz-localize non-time series')
#Exception: Cannot tz-localize non-time series
您会注意到索引返回UTC时区,而不是转换后的太平洋时区

>>> times_index = pd.Index(times)
>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')
>>> times_index_pacific
#<class 'pandas.tseries.index.DatetimeIndex'>
#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]
#Length: 6, Freq: None, Timezone: US/Pacific

如何更改时区并将其作为索引添加到数据帧?

如果将其设置为索引,它将自动转换为索引:

In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)

In [12]: dat
Out[12]:
                    label  value
datetime
2011-07-19 07:00:00     a      0
2011-07-19 08:00:00     a      1
2011-07-19 09:00:00     a      2
2011-07-19 07:00:00     b      3
2011-07-19 08:00:00     b      4
2011-07-19 09:00:00     b      5
然后执行
tz\u本地化操作

In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')

In [13]: dat
Out[13]:
                          label  value
datetime
2011-07-19 00:00:00-07:00     a      0
2011-07-19 01:00:00-07:00     a      1
2011-07-19 02:00:00-07:00     a      2
2011-07-19 00:00:00-07:00     b      3
2011-07-19 01:00:00-07:00     b      4
2011-07-19 02:00:00-07:00     b      5
然后可以将标签列附加到索引:

嗯,这绝对是一个错误

一个棘手的解决方法是直接转换(datetime)级别(当它已经是一个多索引时):


该解决方案似乎不起作用,因为分层索引的索引级别似乎是不可变的(FrozenList是不可变的)

以单数索引开始并追加也不起作用

创建lambda函数以强制转换为时间戳并将返回的序列的每个成员转换为_datetime()也不起作用

有没有办法创建时区感知序列,然后将其插入数据帧/使其成为索引

joined_event_df = joined_event_df.set_index(['pandasTime'])
joined_event_df.index = joined_event_df.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Central')
# we have tz-awareness above this line
joined_event_df = joined_event_df.set_index('sequence', append = True)
# we lose tz-awareness in the index as soon as we add another index
joined_event_df = joined_event_df.swaplevel(0,1)

另一个在pandas 0.13.1中工作并解决了FrozenList无法分配问题的变通方法:

index.levels = pandas.core.base.FrozenList([
    index.levels[0].tz_localize('UTC').tz_convert(tz),
    index.levels[1].tz_localize('UTC').tz_convert(tz)
])

在这个问题上苦苦挣扎,MultiIndex在许多其他情况下也失去了tz。

到目前为止,这个问题已经得到了解决。例如,您现在可以调用:

dataframe.tz_localize('UTC', level=0)

不过,对于给定的示例,您必须调用它两次。(即,每一级一次。)

我认为这是一个错误……是的,这是一种奇怪的行为(时区是邪恶的)。也许值得创造!我遇到了两个问题:1)我不能在多索引上调用tz_本地化或tz_转换;2) 从单个索引访问小时字段时,当我想要太平洋值(即
[0,1,2,0,1,2]
)时,仍然会得到数组
[7,8,9,7,8,9]
。很抱歉,这肯定是一个错误(感谢您找到它)!我添加了一个变通方法(即一旦datetime级别成为多索引,就转换它)。。。
joined_event_df = joined_event_df.set_index(['pandasTime'])
joined_event_df.index = joined_event_df.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Central')
# we have tz-awareness above this line
joined_event_df = joined_event_df.set_index('sequence', append = True)
# we lose tz-awareness in the index as soon as we add another index
joined_event_df = joined_event_df.swaplevel(0,1)
index.levels = pandas.core.base.FrozenList([
    index.levels[0].tz_localize('UTC').tz_convert(tz),
    index.levels[1].tz_localize('UTC').tz_convert(tz)
])
dataframe.tz_localize('UTC', level=0)