Python 根据groupby之后的其他列中的值之间的数据帧范围对seprate列求和_Python_Pandas_Dataframe_Pandas Groupby

Python 根据groupby之后的其他列中的值之间的数据帧范围对seprate列求和

python pandas dataframe

Python 根据groupby之后的其他列中的值之间的数据帧范围对seprate列求和,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有一个如下的数据帧 id Supply days days_180 1 30 0 180 1 100 183 363 1 80 250 430 2 5 0 180 2 5 10 190 3 5 0 180 3 30 100 280 3 30 150 330 3 30 200

我有一个如下的数据帧

id  Supply  days    days_180
1   30         0    180
1   100      183    363
1   80       250    430
2   5          0    180
2   5         10    190
3   5          0    180
3   30       100    280
3   30       150    330
3   30       200    380
3   30       280    460
3   50       310    490

我想计算“供应量”的总和，其中每行的天数介于“天”和“天+180”之间。这需要在groupby（'id'）之后为每个组执行

预期输出如下所示

id  Supply  days    days_180    use
1   30         0        180     30
1   100      183        363     180
1   80       250        430     80
2   5          0        180     10
2   5         10        190     10
3   5          0        180     65
3   30       100        280     120
3   30       150        330     140
3   30       200        380     110
3   30       280        460     80
3   50       310        490     50

我已经尝试了下面的代码，但它没有按预期工作

df_d['use']=df_d.groupby('id').apply(lambda x: x.loc[x['days'].between(x['days'],x['days_180']),'supply'].sum())

对每组循环的每个

days\u 180

值使用列表理解，使用

sum

过滤并创建新列：

def f(x):
    a = [x.loc[(x['days'] <= d) & (x['days_180'] >= d),'Supply'].sum() for d in x['days_180']]
    x['use'] = a
    return x

def（x）：
a=[x.loc[（x['days']=d），'Supply'].sum（）表示x['days_180']]
x['use']=a
返回x

或使用其他lambda的解决方案：

def f(x):
    x['use'] = x['days_180'].apply(lambda d: x.loc[(x['days'] <= d) & 
                                                   (x['days_180'] >= d), 'Supply'].sum())
    return x


df_d = df_d.groupby('id').apply(f)
print (df_d)
    id  Supply  days  days_180  use
0    1      30     0       180   30
1    1     100   183       363  180
2    1      80   250       430   80
3    2       5     0       180   10
4    2       5    10       190    5
5    3       5     0       180   65
6    3      30   100       280  120
7    3      30   150       330  140
8    3      30   200       380  110
9    3      30   280       460   80
10   3      50   310       490   50

def（x）：
x['use']=x['days_180'].应用（lambda d:x.loc[（x['days']=d），'Supply'].sum（））
返回x
df_d=df_d.groupby（'id'）。应用（f）
打印（df\U d）
id供应天数\u 180使用
0    1      30     0       180   30
1    1     100   183       363  180
2    1      80   250       430   80
3    2       5     0       180   10
4    2       5    10       190    5
5    3       5     0       180   65
6    3      30   100       280  120
7    3      30   150       330  140
8    3      30   200       380  110
9    3      30   280       460   80
10   3      50   310       490   50

您也可以使用numpy的广播和np.where来完成

df.groupby（“id”）。应用(
lambda g:g.assign（use=（np.where（（g.days.values>=g.days.values[：，np.newaxis]）&
（g.days.valuesThanks。我以为我们可以用lambda
函数来实现这一点，但显然不行。@moys-不客气！还添加了lambda的解决方案。