Python 数据帧中连续天数的平均值

Python 数据帧中连续天数的平均值,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个熊猫数据帧dfas: Date Val WD 1/3/2019 2.65 Thursday 1/4/2019 2.51 Friday 1/5/2019 2.95 Saturday 1/6/2019 3.39 Sunday 1/7/2019 3.39 Monday 1/12/2019 2.23 Saturday 1/13/2019 2.50 Sunday 1/14/2019 3.62

我有一个熊猫数据帧
df
as:

Date         Val    WD
1/3/2019     2.65   Thursday
1/4/2019     2.51   Friday
1/5/2019     2.95   Saturday
1/6/2019     3.39   Sunday
1/7/2019     3.39   Monday
1/12/2019    2.23   Saturday
1/13/2019    2.50   Sunday
1/14/2019    3.62   Monday
1/15/2019    3.81   Tuesday
1/16/2019    3.75   Wednesday
1/17/2019    3.69   Thursday
1/18/2019    3.47   Friday
我需要从上面获取以下
df2

Date         Val    WD
1/3/2019     2.65   Thursday
1/4/2019     2.51   Friday
1/5/2019     3.24   Saturday
1/6/2019     3.24   Sunday
1/7/2019     3.24   Monday
1/12/2019    2.78   Saturday
1/13/2019    2.78   Sunday
1/14/2019    2.78   Monday
1/15/2019    3.81   Tuesday
1/16/2019    3.75   Wednesday
1/17/2019    3.69   Thursday
1/18/2019    3.47   Friday
其中,df2值更新为具有连续Sat、Sun和Mon值的平均值

i、 e.日期
2019年5月1日、2019年6月1日、2019年7月1日
的df中
2.95、3.39、3.39的平均值为3.24,因此在df2中,我将
2019年5月1日、2019年6月1日、2019年7月1日
的值替换为3.24


诀窍在于找到连续的周六、周日和周一。不确定如何处理此问题。

此逻辑创建一个
系列
,该系列为
数据帧中的连续Sat/Sun/Mon行组分配唯一ID。然后确保其中有3个(不仅仅是Sat/Sun或Sun/Mon),并
转换
这些值的平均值:

import pandas as pd
#df['Date'] = pd.to_datetime(df.Date)

s = (~(df.Date.dt.dayofweek.isin([0,6]) 
       & (df.Date - df.Date.shift(1)).dt.days.eq(1))).cumsum()

to_trans = s[s.groupby(s).transform('size').eq(3)]
df.loc[to_trans.index, 'Val'] = df.loc[to_trans.index].groupby(to_trans).Val.transform('mean')
输出: 扩展输入数据
您可以使用
CustomBusinessDay
pd.grouper
创建组列:

# if you want to only find the mean if all three days are found
from pandas.tseries.offsets import CustomBusinessDay
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')

df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df.update(df[df.groupby('group_col')['Val'].transform('size').eq(3)].groupby('group_col').transform('mean'))

    Date          Val          WD     group_col
0   2019-01-03  2.650000    Thursday    0
1   2019-01-04  2.510000    Friday      1
2   2019-01-05  3.243333    Saturday    2
3   2019-01-06  3.243333    Sunday      2
4   2019-01-07  3.243333    Monday      2
5   2019-01-12  2.783333    Saturday    7
6   2019-01-13  2.783333    Sunday      7
7   2019-01-14  2.783333    Monday      7
8   2019-01-15  3.810000    Tuesday     8
9   2019-01-16  3.750000    Wednesday   9
10  2019-01-17  3.690000    Thursday    10
11  2019-01-18  3.470000    Friday      11
或者,如果你想找到在同一周的星期六和星期一的任何组合的平均值

days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')

df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df['Val'] = df.groupby('group_col')['Val'].transform('mean')

一种方法是计算周数,然后使用
groupby
计算特定天数的平均值,并将其映射回原始数据帧

df['Date'] = pd.to_datetime(df['Date'])

# consider Monday to belong to previous week
week, weekday = df['Date'].dt.week, df['Date'].dt.weekday
df['Week'] = np.where(weekday.eq(0), week - 1, week)

# take means of Fri, Sat, Sun, then map back
mask = weekday.isin([5, 6, 0])
week_val_map = df[mask].groupby('Week')['Val'].mean()
df.loc[mask, 'Val'] = df['Week'].map(week_val_map)

print(df)

         Date       Val         WD  Week
0  2019-01-03  2.650000   Thursday     1
1  2019-01-04  2.510000     Friday     1
2  2019-01-05  3.243333   Saturday     1
3  2019-01-06  3.243333     Sunday     1
4  2019-01-07  3.243333     Monday     1
5  2019-01-12  2.783333   Saturday     2
6  2019-01-13  2.783333     Sunday     2
7  2019-01-14  2.783333     Monday     2
8  2019-01-15  3.810000    Tuesday     3
9  2019-01-16  3.750000  Wednesday     3
10 2019-01-17  3.690000   Thursday     3
11 2019-01-18  3.470000     Friday     3

请注意,
'WD'
列是完全不必要的,因为您可以使用
Series.dt.dayofweek
访问该信息,即使没有周六数据,这也将是周日/周一的平均值。@Alolz您完全正确,感谢您指出这一点。它已被更正。如果您只有连续的星期六和星期天(没有星期一),甚至只有星期六和星期一,会发生什么?在这些情况下,您还是要取平均值,还是保持数据不变?
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')

df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df['Val'] = df.groupby('group_col')['Val'].transform('mean')
df['Date'] = pd.to_datetime(df['Date'])

# consider Monday to belong to previous week
week, weekday = df['Date'].dt.week, df['Date'].dt.weekday
df['Week'] = np.where(weekday.eq(0), week - 1, week)

# take means of Fri, Sat, Sun, then map back
mask = weekday.isin([5, 6, 0])
week_val_map = df[mask].groupby('Week')['Val'].mean()
df.loc[mask, 'Val'] = df['Week'].map(week_val_map)

print(df)

         Date       Val         WD  Week
0  2019-01-03  2.650000   Thursday     1
1  2019-01-04  2.510000     Friday     1
2  2019-01-05  3.243333   Saturday     1
3  2019-01-06  3.243333     Sunday     1
4  2019-01-07  3.243333     Monday     1
5  2019-01-12  2.783333   Saturday     2
6  2019-01-13  2.783333     Sunday     2
7  2019-01-14  2.783333     Monday     2
8  2019-01-15  3.810000    Tuesday     3
9  2019-01-16  3.750000  Wednesday     3
10 2019-01-17  3.690000   Thursday     3
11 2019-01-18  3.470000     Friday     3