Python 更改数据帧中的索引数？_Python_Datetime_Pandas_Calendar_Dataframe

Python 更改数据帧中的索引数？

python datetime pandas calendar dataframe

Python 更改数据帧中的索引数？,python,datetime,pandas,calendar,dataframe,Python,Datetime,Pandas,Calendar,Dataframe,我正在尝试更改以下代码的输出： import numpy as np import pandas as pd from pandas import Series, DataFrame, Panel, bdate_range, DatetimeIndex, date_range from pandas.tseries.holiday import get_calendar from datetime import datetime, timedelta import pytz as pytz fr

我正在尝试更改以下代码的输出：

import numpy as np
import pandas as pd
from pandas import Series, DataFrame, Panel, bdate_range, DatetimeIndex, date_range
from pandas.tseries.holiday import get_calendar
from datetime import datetime, timedelta
import pytz as pytz
from pytz import timezone

start =  datetime(2013, 1, 1)

hr1 = np.loadtxt("Spot_2013_Hour1.txt")

index = date_range(start, end = '2013-12-31', freq='B')
Allhrs = Series(index)
Allhrs = DataFrame({'hr1': hr1})
df = Allhrs
indexed_df = df.set_index(index)
print indexed_df

错误：

  File "<ipython-input-61-c7890d8ccb07>", line 17, in <module>
    indexed_df = df.set_index(index)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2390, in set_index
    frame.index = index

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1849, in __setattr__
    object.__setattr__(self, name, value)

  File "properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:38491)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 400, in _set_axis
    self._data.set_axis(axis, labels)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 1965, in set_axis
    'new values have %d elements' % (old_len, new_len))

ValueError: Length mismatch: Expected axis has 365 elements, new values have 261 elements

文件“”，第17行，在
索引的_df=df.set _索引（index）
文件“/Applications/anaconda/lib/python2.7/site packages/pandas/core/frame.py”，第2390行，在集合索引中
frame.index=索引
文件“/Applications/anaconda/lib/python2.7/site packages/pandas/core/generic.py”，第1849行，在__
对象。设置属性（自身、名称、值）
文件“properties.pyx”，第65行，在pandas.lib.AxisProperty.\uuuu set\uuuu（pandas/lib.c:38491）中
文件“/Applications/anaconda/lib/python2.7/site packages/pandas/core/generic.py”，第400行，在轴上
自身数据。设置轴（轴、标签）
文件“/Applications/anaconda/lib/python2.7/site packages/pandas/core/internals.py”，第1965行，在集合轴中
'新值有%d个元素“%”（旧的、新的））
ValueError:长度不匹配：预期轴有365个元素，新值有261个元素

问题是：

我有一个从txt文件加载的时间序列。时间序列由365个元素组成，即2013年的所有天数。我需要这个txt文件，因为我需要每天分析

此外，我还需要分析2013年的具体天数。因此，我希望更改数据的读取，也就是说，我只希望看到工作日。此外，查看/打印特定的日期也很好

帮助

首先，创建一个包含一年中所有日期的数据框（或系列）：

index = date_range(start='2013-1-1', end='2013-12-31', freq='D')
df = pd.DataFrame(hr1, index=index)

接下来，使用

df.asfreq（'B'）

将样本

df

减少到工作日：

import numpy as np
import pandas as pd

# hr1 = np.loadtxt("Spot_2013_Hour1.txt")
hr1 = np.random.random(365)
index = date_range(start='2013-1-1', end='2013-12-31', freq='D')
df = pd.DataFrame(hr1, index=index)

indexed_df = df.asfreq('B')
print(indexed_df)

要将频率设置为工作日，但不包括某些日期，您可以使用：

因此，

custom_-df

比

indexed_-df

In [12]: len(custom_df)
Out[12]: 259

In [13]: len(indexed_df)
Out[13]: 261

缺少像“2013-10-03”这样的“假日”：

In [18]: '2013-10-03' in indexed_df.index
Out[18]: True

In [19]: '2013-10-03' in custom_df.index
Out[19]: False

了解可用于子选择行的属性也很有用。例如，您可以从索引中减去特定天数：

idx = indexed_df.index - pd.DatetimeIndex(holidays)
custom_df2 = df.reindex(idx)

结果，

custom_df2

等于

custom_df

：

In [35]: custom_df2.equals(custom_df)
Out[35]: True

但请注意，索引有点不同：

In [36]: custom_df.index
Out[36]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-12-31]
Length: 259, Freq: C, Timezone: None

In [37]: custom_df2.index
Out[37]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-12-31]
Length: 259, Freq: None, Timezone: None

[36]中的

：自定义索引
出[36]：
[2013-01-01, ..., 2013-12-31]
长度：259，频率：C，时区：无
In[37]：自定义_df2.index
出[37]：
[2013-01-01, ..., 2013-12-31]
长度：259，频率：无，时区：无

custom_-df

作为

Freq:C

，而

custom_-df2

具有

Freq:None

。

freq

由某些方法使用，例如

snap

和

to\u period

。但这些方法也允许您指定所需的频率作为参数，因此在实践中，我没有发现这种差异有什么大不了的。

首先，创建一个包含一年中所有日期的数据帧（或序列）：

index = date_range(start='2013-1-1', end='2013-12-31', freq='D')
df = pd.DataFrame(hr1, index=index)