Python 熊猫-添加新的聚合功能_Python_Pandas

Python 熊猫-添加新的聚合功能

python pandas

Python 熊猫-添加新的聚合功能,python,pandas,Python,Pandas,我在熊猫中有这个数据帧： day customer amount 0 1 cust1 500 1 2 cust2 100 2 1 cust1 50 3 2 cust1 100 4 2 cust2 250 5 6 cust1 20 为方便起见： df = pd.DataFrame({'day': [1, 2, 1, 2, 2, 6],

我在熊猫中有这个数据帧：

   day customer  amount
0    1    cust1     500
1    2    cust2     100
2    1    cust1      50
3    2    cust1     100
4    2    cust2     250
5    6    cust1      20

为方便起见：

df = pd.DataFrame({'day': [1, 2, 1, 2, 2, 6],
                   'customer': ['cust1', 'cust2', 'cust1', 'cust1', 'cust2', 'cust1'],
                   'amount': [500, 100, 50, 100, 250, 20]})

我想创建一个新的列“amount2days”，以便增加过去两天每位客户的金额，以获得以下数据框：

   day customer  amount    amount2days   ----------------------------
0    1    cust1     500    500           (no past transactions)
1    2    cust2     100    100           (no past transactions)
2    1    cust1      50    550           (500 + 50 = rows 0,2 
3    2    cust1     100    650           (500 + 50 + 100, rows 0,2,3)
4    2    cust2     250    350           (100 + 250, rows 1,4) 
5    6    cust1      20    20            (notice day is 6, and no day=5 for cust1)

i、 e.我想执行以下（伪）代码：

每行。最方便的方法是什么

我希望进行的求和是在一天内完成的，但天不一定要在每一行中递增，如示例所示。我仍然想计算过去两天的金额。

您可以使用panda的

滚动

来移动窗口操作（取决于panda的版本，

重置索引

，就像jezrael的回答中那样会更安全）：

使用

groupby

和

sum

注意: 以下是避免数据错误对齐的必要添加：

df['amount2days'] = (df.groupby('customer')['amount']
                       .rolling(2, min_periods=0)
                       .sum()
                       .reset_index(level=0, drop=True))
print (df)
   day customer  amount  amount2days
1    1    cust1     500        500.0
2    2    cust1     100        600.0
3    3    cust1     250        350.0

为什么不在这里使用

。\u numpy

？因为如果不是默认索引，则输出应被错误分配-请检查以下示例：

df = pd.DataFrame({'day': {0: 1, 2: 2, 5: 3, 1: 1, 6: 2, 4: 3}, 'customer': {0: 'cust2', 2: 'cust2', 5: 'cust2', 1: 'cust1', 6: 'cust1', 4: 'cust1'}, 'amount': {0: 5000, 2: 1000, 5: 2500, 1: 500, 6: 100, 4: 250}})
print (df)
   day customer  amount
0    1    cust2    5000
2    2    cust2    1000
5    3    cust2    2500
1    1    cust1     500
6    2    cust1     100
4    3    cust1     250

编辑：一般解决方案：

def f(x):
    N = 1
    for i in pd.unique(x['day']):
        y = x[x['day'].between(i - N, i)]
        x.loc[y.index[-1], 'amountNdays'] = y['amount'].sum()
    
    return x

df = df.groupby('customer').apply(f)
df['amountNdays'] = df['amountNdays'].fillna(df['amount'])
print (df)
   day customer  amount  amountNdays
0    1    cust1     500        500.0
1    2    cust2     100        100.0
2    1    cust1      50        550.0
3    2    cust1     100        650.0
4    2    cust2     250        350.0
5    6    cust1      20         20.0

这回答了你的问题吗？谢谢我更新了我的问题，使之更清楚。我的数据不一定会在每一行中增加“天”，但我仍然希望向后加2天。在这种情况下，简单的滚动会起作用吗？如果向后的行没有什么可求和的话，仍然不起作用，我再次编辑了我的示例（应该只有20行，但是这个方法给出了120行）。谢谢。@jezarel加上“there”我是指最后一行的“amount2days”。也许有一种方法可以很容易地将其概括为amount2days？@user112112-添加的一般解决方案注释不用于扩展讨论；这段对话已经结束。

df = pd.DataFrame({'day': {0: 1, 2: 2, 5: 3, 1: 1, 6: 2, 4: 3}, 'customer': {0: 'cust2', 2: 'cust2', 5: 'cust2', 1: 'cust1', 6: 'cust1', 4: 'cust1'}, 'amount': {0: 5000, 2: 1000, 5: 2500, 1: 500, 6: 100, 4: 250}})
print (df)
   day customer  amount
0    1    cust2    5000
2    2    cust2    1000
5    3    cust2    2500
1    1    cust1     500
6    2    cust1     100
4    3    cust1     250

df['amount2days'] = (df.groupby('customer', sort=False).amount
                       .rolling(2, min_periods=0)
                       .sum()
                       .to_numpy())

df['amount2days1'] = (df.groupby('customer')['amount']
                       .rolling(2, min_periods=0)
                       .sum()
                       .reset_index(level=0, drop=True))
print (df)
   day customer  amount  amount2days  amount2days1
0    1    cust2    5000        500.0        5000.0
2    2    cust2    1000        600.0        6000.0
5    3    cust2    2500        350.0        3500.0
1    1    cust1     500       5000.0         500.0
6    2    cust1     100       6000.0         600.0
4    3    cust1     250       3500.0         350.0

def f(x):
    N = 1
    for i in pd.unique(x['day']):
        y = x[x['day'].between(i - N, i)]
        x.loc[y.index[-1], 'amountNdays'] = y['amount'].sum()
    
    return x

df = df.groupby('customer').apply(f)
df['amountNdays'] = df['amountNdays'].fillna(df['amount'])
print (df)
   day customer  amount  amountNdays
0    1    cust1     500        500.0
1    2    cust2     100        100.0
2    1    cust1      50        550.0
3    2    cust1     100        650.0
4    2    cust2     250        350.0
5    6    cust1      20         20.0