Python 如何在大熊猫中进行分组开窗

Python 如何在大熊猫中进行分组开窗,python,pandas,Python,Pandas,特别是,我想按组对一系列中两个日期之间的差异进行扩展平均。所以如果我有这样的东西: Period Group dates 1 A 2010-07-01 2 A 2010-07-13 3 A 2010-07-13 4 A 2010-07-21 1 B 2000-08-20 2 B 2000-08-15 我会

特别是,我想按组对一系列中两个日期之间的差异进行扩展平均。所以如果我有这样的东西:

Period    Group    dates
  1         A      2010-07-01
  2         A      2010-07-13
  3         A      2010-07-13
  4         A      2010-07-21
  1         B      2000-08-20
  2         B      2000-08-15
我会得到:

Period    Group    cumulative average of differences
  1         A        0
  2         A        12/2
  3         A        12/3
  4         A        20/4
  1         B        0
  2         B       -5/2
输出:

0            00:00:00
1    6 days, 00:00:00
2    4 days, 00:00:00
3    5 days, 00:00:00
4            00:00:00
5   -2 days, 12:00:00
dtype: timedelta64[ns]
输出:

0            00:00:00
1    6 days, 00:00:00
2    4 days, 00:00:00
3    5 days, 00:00:00
4            00:00:00
5   -2 days, 12:00:00
dtype: timedelta64[ns]

我有一个比以前发布的解决方案略长的替代方案,但我认为它可能更容易理解日期列转换函数内部的情况,而且输出格式也更清晰:

import numpy as np
import pandas as pd
from datetime import date

# Build data
prd = [1, 2, 3, 4, 1, 2]
grp = ['A', 'A', 'A', 'A', 'B', 'B']
yr =  [2010, 2010, 2010, 2010, 2000, 2000]
mth = [7, 7, 7, 7, 8, 8]
day = [1, 13, 13, 21, 20, 15]
dt = [date(y, m, d) for y, m, d in zip(yr, mth, day)]
# Create data frame
df = pd.DataFrame({'Period': prd, 'Group': grp, 'Dates': dt},
                  columns=['Period', 'Group', 'Dates'])

# Transformation function for the date column
def f(ser):
    v = ser.values
    # Get time difference in days
    delta = [float((ii-v[0]).days) for ii in v]
    # Get number of items to divide by
    dv = np.arange(len(delta))+1
    # Get cumulative average
    cumavg = [nm/dm for nm, dm in zip(delta, dv)]
    # Create output pandas Series object and return it
    out = pd.Series(cumavg, index=ser.index)
    return out

# Apply the transformation function to the Dates column
dfappend = pd.DataFrame({'Cum_Avg': df.groupby("Group").Dates.apply(f)})
# Delete the Dates column
del df['Dates']
# Merge to create the revised data frame
df = pd.merge(df, dfappend, left_index=True, right_index=True)
print(df)
输出为:

   Period Group  Cum_Avg
0       1     A      0.0
1       2     A      6.0
2       3     A      4.0
3       4     A      5.0
4       1     B      0.0
5       2     B     -2.5

我有一个比以前发布的解决方案略长的替代方案,但我认为它可能更容易理解日期列转换函数内部的情况,而且输出格式也更清晰:

import numpy as np
import pandas as pd
from datetime import date

# Build data
prd = [1, 2, 3, 4, 1, 2]
grp = ['A', 'A', 'A', 'A', 'B', 'B']
yr =  [2010, 2010, 2010, 2010, 2000, 2000]
mth = [7, 7, 7, 7, 8, 8]
day = [1, 13, 13, 21, 20, 15]
dt = [date(y, m, d) for y, m, d in zip(yr, mth, day)]
# Create data frame
df = pd.DataFrame({'Period': prd, 'Group': grp, 'Dates': dt},
                  columns=['Period', 'Group', 'Dates'])

# Transformation function for the date column
def f(ser):
    v = ser.values
    # Get time difference in days
    delta = [float((ii-v[0]).days) for ii in v]
    # Get number of items to divide by
    dv = np.arange(len(delta))+1
    # Get cumulative average
    cumavg = [nm/dm for nm, dm in zip(delta, dv)]
    # Create output pandas Series object and return it
    out = pd.Series(cumavg, index=ser.index)
    return out

# Apply the transformation function to the Dates column
dfappend = pd.DataFrame({'Cum_Avg': df.groupby("Group").Dates.apply(f)})
# Delete the Dates column
del df['Dates']
# Merge to create the revised data frame
df = pd.merge(df, dfappend, left_index=True, right_index=True)
print(df)
输出为:

   Period Group  Cum_Avg
0       1     A      0.0
1       2     A      6.0
2       3     A      4.0
3       4     A      5.0
4       1     B      0.0
5       2     B     -2.5

2b值不应该是-5/2吗?您最终是在寻找天数的平均差值(作为浮动)?2b值不应该是-5/2吗?您最终要寻找的是以天数为单位的平均差值(作为浮动)?仅供参考。要将最终值转换为浮动天数,您可以将结果除以
np.timedelta(1,'D')
FYI要将最终值转换为浮动天数,您可以将结果除以
np.timedelta(1,'D')