Python 熊猫在时间序列数据中表示假日_Python_Pandas

Python 熊猫在时间序列数据中表示假日

python pandas

Python 熊猫在时间序列数据中表示假日,python,pandas,Python,Pandas,更新2-更好的问题是否有人知道如何创建一个数据框列，将日历假日日期表示为1，将非日历假日日期表示为零要组成某些时间序列数据，请执行以下操作： import pandas as pd import numpy as np from numpy.random import randint from pandas.tseries.holiday import USFederalHolidayCalendar np.random.seed(10) # added for reproducti

更新2-更好的问题

是否有人知道如何创建一个数据框列，将日历假日日期表示为1，将非日历假日日期表示为零

要组成某些时间序列数据，请执行以下操作：

import pandas as pd 
import numpy as np 
from numpy.random import randint
from pandas.tseries.holiday import USFederalHolidayCalendar

np.random.seed(10)  # added for reproductibility                                                                                                                                                                 


rows,cols = 8760,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

cal = USFederalHolidayCalendar()
cal.rules


df.index = pd.to_datetime(df.index)

first = str(df.first('1D').index.date[0])
last = str(df.last('1D').index.date[0])

pd.get_dummies(cal.holidays(start=first, end=last, return_name=True))
hols = pd.get_dummies(cal.holidays(start=first, end=last, return_name=True))

print(hols)

这将打印

hols

            Christmas  Columbus Day  July 4th  Labor Day  \
2019-01-01          0             0         0          0   
2019-01-21          0             0         0          0   
2019-02-18          0             0         0          0   
2019-05-27          0             0         0          0   
2019-07-04          0             0         1          0   
2019-09-02          0             0         0          1   
2019-10-14          0             1         0          0   
2019-11-11          0             0         0          0   
2019-11-28          0             0         0          0   
2019-12-25          1             0         0          0

但是，如果我尝试将所有这些与

numpy组合成一列，其中

我会得到一个错误

df['holiday'] = np.where((hols['Christmas']==1) or (hols['Columbus Day']==1) or (hols['July 4th']==1) or (hols['Labor Day']==1) or (hols['Martin Luther King Jr. Day']==1) or (hols['Memorial Day']==1) or (hols['New Years Day']==1) or (hols['Presidents Day']==1) or (hols['Thanksgiving']==1) or (hols['Veterans Day']==1), 0, 1)

错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

有人想试试吗？

一个选项：

df.index=pd.to_datetime（df.index）
first=str（df.first（'1D'）.index.date[0]）
last=str（df.last（'1D'）.index.date[0]）
cal=USFederalHolidayCalendar（）
#将假日转换为数据帧
hol=cal.holidays（start=first，end=last）。重命名（'Holiday'）。为_frame（）
#对于节假日，将值设置为1
hol['Holiday']=1
#加入DF
df=df.join(
霍尔，
on=df.index.to_period（'D'）.astype（'datetime64[ns]'））
)
#用0填充NA，转换为int
df['Holiday']=df['Holiday'].fillna（0）.astype（int）
打印（df.head（25.to_string（））

一个选择：

df.head（25）.to_string（）

：

温度值假日
2019-01-01 00:00:00     0.771321  0.020752        1
2019-01-01 01:00:00     0.633648  0.748804        1
2019-01-01 02:00:00     0.498507  0.224797        1
2019-01-01 03:00:00     0.198063  0.760531        1
2019-01-01 04:00:00     0.169111  0.088340        1
2019-01-01 05:00:00     0.685360  0.953393        1
2019-01-01 06:00:00     0.003948  0.512192        1
2019-01-01 07:00:00     0.812621  0.612526        1
2019-01-01 08:00:00     0.721755  0.291876        1
2019-01-01 09:00:00     0.917774  0.714576        1
2019-01-01 10:00:00     0.542544  0.142170        1
2019-01-01 11:00:00     0.373341  0.674134        1
2019-01-01 12:00:00     0.441833  0.434014        1
2019-01-01 13:00:00     0.617767  0.513138        1
2019-01-01 14:00:00     0.650397  0.601039        1
2019-01-01 15:00:00     0.805223  0.521647        1
2019-01-01 16:00:00     0.908649  0.319236        1
2019-01-01 17:00:00     0.090459  0.300700        1
2019-01-01 18:00:00     0.113984  0.828681        1
2019-01-01 19:00:00     0.046896  0.626287        1
2019-01-01 20:00:00     0.547586  0.819287        1
2019-01-01 21:00:00     0.198948  0.856850        1
2019-01-01 22:00:00     0.351653  0.754648        1
2019-01-01 23:00:00     0.295962  0.883936        1
2019-01-02 00:00:00     0.325512  0.165016        0

您希望您的单列是什么样子？另外，您提供的代码没有按原样运行，您可以将其更新为MRE吗？什么是MRE？是什么意思？我更新了帖子更新了这个问题现在看起来更好了吗？很好。多谢。。。

hol = cal.holidays(start=first, end=last)

# np where with isin
df['Holiday'] = np.where(
    df.index.to_period('D').astype('datetime64[ns]').isin(hol),
    1, 0
)