Python 使用现有的“公共假日”和“周末”列为“长周末”创建新列
我有一个pandas数据框架,其中包含日期、星期日、公共假日和周末列Python 使用现有的“公共假日”和“周末”列为“长周末”创建新列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个pandas数据框架,其中包含日期、星期日、公共假日和周末列 weekday Date Public_Holiday? Weekend? 5 2015-01-10 no yes 0 2015-01-12 no no 1 2015-01-13 no no 2 2015-01-14
weekday Date Public_Holiday? Weekend?
5 2015-01-10 no yes
0 2015-01-12 no no
1 2015-01-13 no no
2 2015-01-14 no no
3 2015-01-15 no no
4 2015-01-16 no no
5 2015-01-17 no yes
6 2015-01-18 no yes
0 2015-01-19 no no
1 2015-01-20 no no
2 2015-01-21 no no
3 2015-01-22 no no
4 2015-01-23 yes no
5 2015-01-24 no yes
6 2015-01-25 no yes
1 2015-01-27 no no
2 2015-01-28 no no
3 2015-01-29 no no
4 2015-01-30 no no
5 2015-01-31 no yes
0 2015-02-02 no no
1 2015-02-03 no no
2 2015-02-04 no no
3 2015-02-05 no no
4 2015-02-06 no no
5 2015-02-07 no yes
6 2015-02-08 no yes
0 2015-02-09 yes no
1 2015-02-10 no no
2 2015-02-11 no no
我需要添加一个额外的列,其中有长周末标志。输出应该如下所示
long_weekend weekday Date Public_Holiday? Weekend?
0 5 2015-01-10 no yes
0 0 2015-01-12 no no
0 1 2015-01-13 no no
0 2 2015-01-14 no no
0 3 2015-01-15 no no
0 4 2015-01-16 no no
0 5 2015-01-17 no yes
0 6 2015-01-18 no yes
0 0 2015-01-19 no no
0 1 2015-01-20 no no
0 2 2015-01-21 no no
0 3 2015-01-22 no no
1 4 2015-01-23 yes no
1 5 2015-01-24 no yes
1 6 2015-01-25 no yes
0 1 2015-01-27 no no
0 2 2015-01-28 no no
0 3 2015-01-29 no no
0 4 2015-01-30 no no
0 5 2015-01-31 no yes
0 0 2015-02-02 no no
0 1 2015-02-03 no no
0 2 2015-02-04 no no
0 3 2015-02-05 no no
0 4 2015-02-06 no no
1 5 2015-02-07 no yes
1 6 2015-02-08 no yes
1 0 2015-02-09 yes no
0 1 2015-02-10 no no
0 2 2015-02-11 no no
正常周末不被视为长周末。只有在星期五或星期一,在某些情况下,如果星期四或星期二是假日,整个系列才被视为长周末
下面是我试过的
df['long_weekend'] = np.where((df['Public_Holiday?'] == 'yes') | (df['Weekend?'] == 'yes'), 1, 0)
df['weekday'] = df['Predicted_Date'].dt.dayofweek
df['long_weekend'] = np.where(((df['long_weekend'] == 1) & (df['weekday'] == 4)) | (df['long_weekend'] == 1) & (df['weekday'] == 0)), 'yes','no')
这给了我以下输出,它甚至将正常工作日设置为1
long_weekend weekday Date Public_Holiday? Weekend?
1 5 2015-01-10 no yes
0 0 2015-01-12 no no
0 1 2015-01-13 no no
0 2 2015-01-14 no no
0 3 2015-01-15 no no
0 4 2015-01-16 no no
1 5 2015-01-17 no yes
1 6 2015-01-18 no yes
0 0 2015-01-19 no no
0 1 2015-01-20 no no
0 2 2015-01-21 no no
0 3 2015-01-22 no no
1 4 2015-01-23 yes no
1 5 2015-01-24 no yes
1 6 2015-01-25 no yes
0 1 2015-01-27 no no
0 2 2015-01-28 no no
0 3 2015-01-29 no no
0 4 2015-01-30 no no
1 5 2015-01-31 no yes
0 0 2015-02-02 no no
0 1 2015-02-03 no no
0 2 2015-02-04 no no
0 3 2015-02-05 no no
0 4 2015-02-06 no no
1 5 2015-02-07 no yes
1 6 2015-02-08 no yes
1 0 2015-02-09 yes no
0 1 2015-02-10 no no
0 2 2015-02-11 no no
我怎样才能让它工作?任何帮助都会很好。提前感谢。想法是使用
shift
和cumsum
创建连续的组,并使用map
和value\u计数
计算组数,并使用更多值进行过滤,如2
:
long = (df['Public_Holiday?'] == 'yes') | (df['Weekend?'] == 'yes')
s = long.ne(long.shift()).cumsum()
df['long_weekend'] = np.where((s.map(s.value_counts()) > 2) & long, 1, 0)
您为df['long_weekend']发布的输出格式为-
10 1
,但在您的代码中,您将yes-no
分配给df['long_weekend'],您能确认吗?
print (df)
weekday Predicted_Date Public_Holiday? Weekend? long_weekend
0 5 2015-01-10 no yes 0
1 0 2015-01-12 no no 0
2 1 2015-01-13 no no 0
3 2 2015-01-14 no no 0
4 3 2015-01-15 no no 0
5 4 2015-01-16 no no 0
6 5 2015-01-17 no yes 0
7 6 2015-01-18 no yes 0
8 0 2015-01-19 no no 0
9 1 2015-01-20 no no 0
10 2 2015-01-21 no no 0
11 3 2015-01-22 no no 0
12 4 2015-01-23 yes no 1
13 5 2015-01-24 no yes 1
14 6 2015-01-25 no yes 1
15 1 2015-01-27 no no 0
16 2 2015-01-28 no no 0
17 3 2015-01-29 no no 0
18 4 2015-01-30 no no 0
19 5 2015-01-31 no yes 0
20 0 2015-02-02 no no 0
21 1 2015-02-03 no no 0
22 2 2015-02-04 no no 0
23 3 2015-02-05 no no 0
24 4 2015-02-06 no no 0
25 5 2015-02-07 no yes 1
26 6 2015-02-08 no yes 1
27 0 2015-02-09 yes no 1
28 1 2015-02-10 no no 0
29 2 2015-02-11 no no 0