Pandas 达到某个值后重置累积和，并将标志设置为1_Pandas_Cumulative Sum

Pandas 达到某个值后重置累积和，并将标志设置为1

pandas

Pandas 达到某个值后重置累积和，并将标志设置为1,pandas,cumulative-sum,Pandas,Cumulative Sum,我很难想出一种方法来对列执行累积求和，并在它达到某个值时创建一个标志因此，给定一个数据帧： df = pd.DataFrame([[5,1],[6,1],[30,1],[170,0],[5,1],[10,1]],columns = ['a','b']) a b 0 5 1 1 6 1 2 30 1 3 170 0 4 5 1 5 10 1 对于A列，我希望执行累积和，如果达到最大值，则将“Flag”列值设置为1。达到该最大值后，它将重置

我很难想出一种方法来对列执行累积求和，并在它达到某个值时创建一个标志

因此，给定一个数据帧：

df = pd.DataFrame([[5,1],[6,1],[30,1],[170,0],[5,1],[10,1]],columns = ['a','b'])

     a  b
0    5  1
1    6  1
2   30  1
3  170  0
4    5  1
5   10  1

对于A列，我希望执行累积和，如果达到最大值，则将“Flag”列值设置为1。达到该最大值后，它将重置为0。在这种情况下，最大值为40。任何累计总和超过40将触发重置

Desired Output

     a  b  Flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0

任何帮助都将不胜感激

“普通”cumsum（）在这里是无用的，因为这个函数“不知道” 在哪里重新开始求和

您可以使用以下自定义函数执行此操作：

def myCumSum(x, thr):
    if myCumSum.prev >= thr:
        myCumSum.prev = 0
    myCumSum.prev += x
    return myCumSum.prev

myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])

此函数是“带内存”（来自上一次调用）-prev，因此是一种“知道”在哪里重新启动的方法

要加快执行速度，请定义此函数的向量化版本：

def myCumSum(x, thr):
    if myCumSum.prev >= thr:
        myCumSum.prev = 0
    myCumSum.prev += x
    return myCumSum.prev

myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])

然后执行：

threshold = 40
myCumSum.prev = 0  # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int)  # Compute "flag" column

结果是：

     a  b  flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0

这是不可能的矢量化，我想：看，明白了。谢谢你的参考！谢谢你的详细解释！