Python 在第二列中使用运行总计_Python_Pandas

Python 在第二列中使用运行总计

python pandas

Python 在第二列中使用运行总计,python,pandas,Python,Pandas,我正在将一个函数转换为pandas，该函数在集合上循环，并根据条件和运行总数更新每个值。函数如下所示 def calculate_value(): cumulative_amount = 0 for row in rows: if row['amount'] < 0: return 0 amount = 0 if row['kind'] == 'A': amount = r

我正在将一个函数转换为pandas，该函数在集合上循环，并根据条件和运行总数更新每个值。函数如下所示

def calculate_value():
    cumulative_amount = 0

    for row in rows:
        if row['amount'] < 0:
            return 0

        amount = 0

        if row['kind'] == 'A':
            amount = row['amount'] * row['input_amount']
        elif row['kind'] == 'B':
            amount = row['input_amount'] - cumulative_amount
        elif row['kind'] == 'C':
            amount = row['amount']

        cumulative_amount += amount
        row['result'] = amount

        if row['kind'] == 'B':
            break

    return rows

输入和期望输出的示例如下：

df = pd.DataFrame({
    'kind': {1: 'C', 2: 'E', 3: 'A', 4: 'A', 5: 'B', 6: 'C'},
    'amount': {1: -800, 2: 100, 3: 0.5, 4: 0.5, 5: 0, 6: 200},
    'input_amount': {1: 800, 2: 800, 3: 800, 4: 800, 5: 800, 6: 800}
})

   amount  input_amount kind  cumulative_amount  result
1  -800.0           800    C                0.0     0.0
2   100.0           800    E                0.0     0.0
3     0.5           800    A              400.0   400.0
4     0.5           800    A              800.0   400.0
5     0.0           800    B              800.0     0.0
6   200.0           800    C              800.0     0.0

如果我理解正确，只有“B”类的结果取决于其他行。因此，你可以先做其他事情：

df['result'] = 0.

a = (df.kind == 'A') & (df.amount >= 0) 
c = (df.kind == 'C') & (df.amount >= 0)

df.loc[a, 'result'] = df.loc[a, 'amount'] * df.loc[a, 'input_amount']
df.loc[c, 'result'] = df.loc[c, 'amount']

请执行以下操作：

df['cumulative_amount'] = df.result.cumsum()

更正“B”类型所有事件的“累计金额”值：

在第一次出现“B”后更正“结果”和“累计金额”的值：

您能提供相同的输入数据帧和预期的输出吗？@ScottBoston done.apply在这里似乎不是一个好方法，因为它可以逐行工作，并且在计算期间不需要访问数据帧的其他行。你最初的解决方案对我来说绝对不错。@MartinValgur我不是熊猫专家，但我觉得这样做是个坏主意。人们经常引用这句话，你永远不应该修改你正在迭代的东西。我已经更新了我的代码以使用iterrows和df.set_值，并且它按照预期工作，但我不清楚这是否是做事情的最佳方式。

df['cumulative_amount'] = df.result.cumsum()

df.loc[(df.kind == 'B'), 'result'] = df.loc[(df.kind == 'B'), 'input_amount'].values - df.loc[(df.kind.shift(-1) == 'B'), 'cumulative_amount'].values

df.loc[(df.kind == 'B').cumsum().shift() > 0, 'result'] = 0
# (df.kind == 'B').cumsum().shift() is a running count of the number of B's encountered prior to the row index, 
# so you want to 'stop' once this number is no longer zero
# You could of course do this more simply by figuring out which position in the index has the first B, 
# then using .ix or .iloc, but it's actually longer to type out.

df['cumulative_amount'] = df.result.cumsum() # Once more, because we've changed the value of results below B.