Python 熊猫：基于另一列增加或重置计数_Python_Pandas

Python 熊猫：基于另一列增加或重置计数

python pandas

Python 熊猫：基于另一列增加或重置计数,python,pandas,Python,Pandas,我有一个熊猫数据框，代表了分数的时间序列。我想使用该分数根据以下标准计算CookiePoints列：每次得分比上一次得分提高时，都会得到Cookie分数每次分数没有提高，所有CookiePoints都会被取消作为惩罚（CookiePoints设置为0） 3个Cookiepoints可以兑换成一块饼干。因此，达到3后，CookiePoints计数应为1（如果分数更高）或0（如果分数不更高）请参见下面的示例： Score CookiePoints 14 0 1

我有一个熊猫数据框，代表了分数的时间序列。我想使用该分数根据以下标准计算CookiePoints列：

每次得分比上一次得分提高时，都会得到Cookie分数
每次分数没有提高，所有CookiePoints都会被取消作为惩罚（CookiePoints设置为0）
3个Cookiepoints可以兑换成一块饼干。因此，达到3后，CookiePoints计数应为1（如果分数更高）或0（如果分数不更高）

请参见下面的示例：

Score       CookiePoints
14          0
13          0
14          1
17          2
17          0
19          1
20          2
22          3
23          1
17          0
19          1
20          2
22          3
21          0

请注意，这是一个错误。解决方案必须使用Pandas数据帧，理想情况下只能使用矢量化操作。

这当然是一个棘手的问题，但仍有可能在Pandas内部解决。（更新V3解决方案）

第3版（OneLiner）

第2版

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        

result = mask.groupby((~mask).cumsum()).cumsum().mod(3).replace(0,3).where(mask,0).map(int)

第1版

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        # Identify score going up

mask 

0     False
1     False
2      True
3      True
4     False
5      True
6      True
7      True
8      True
9     False
10     True
11     True
12     True
13    False
dtype: bool

# Use False Cumsum to group True values

group = (mask==False).cumsum()

group
0     1
1     2
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     4
10    4
11    4
12    4
13    5
dtype: int64

# Groupby False Cumsum
temp = mask.groupby(group).cumsum().map(int)
temp

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     4
9     0
10    1
11    2
12    3
13    0
dtype: int64

# Fix Cap at 3
# result = temp.where(temp<=3,temp.mod(3)) # This is Wrong. 

result = temp.mod(3).replace(0,3).where(mask,0)
result

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     1
9     0
10    1
11    2
12    3
13    0
dtype: int64

score=pd.系列（[14,13,14,17,17,19,20,22,23,17,19,20,22,21]）
mask=score.diff（）>0#标识分数上升
面具
0错误
1错误
2正确
3正确
4错误
5对
6正确
7正确
8正确
9错误
10对
11对
12对
13错误
数据类型：bool
#使用False Cumsum将真值分组
组=（掩码==False）.cumsum（）
组
0     1
1     2
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     4
10    4
11    4
12    4
13    5
数据类型：int64
#Groupby假积数
temp=mask.groupby（group.cumsum（）.map（int）
临时雇员
0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     4
9     0
10    1
11    2
12    3
13    0
数据类型：int64
#把盖子固定在3号
#result=temp.where（temp您是否有当前未分区的实现？这基本上是一个动态累积和（在本例中是一列1），我认为这是无法矢量化的。请参阅我昨天遇到的另一个链接：@KevinWinata no。因此这也会很有帮助。感谢ALollz和anky_91的链接-我现在正在阅读这些链接。最后一行中有一个小问题。我会很快解决。（已修复）回答得很好，我尝试了一段时间来制作一个单行程序，但没有成功。mod的使用非常聪明。虽然不是很容易阅读，但非常聪明，如果我遇到类似的问题，我将在将来访问此帖子problem@Datanovice嗯……这是一艘班轮的费用：）谢谢你，这很有帮助，也给人留下了深刻印象。
score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        # Identify score going up

mask 

0     False
1     False
2      True
3      True
4     False
5      True
6      True
7      True
8      True
9     False
10     True
11     True
12     True
13    False
dtype: bool

# Use False Cumsum to group True values

group = (mask==False).cumsum()

group
0     1
1     2
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     4
10    4
11    4
12    4
13    5
dtype: int64

# Groupby False Cumsum
temp = mask.groupby(group).cumsum().map(int)
temp

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     4
9     0
10    1
11    2
12    3
13    0
dtype: int64

# Fix Cap at 3
# result = temp.where(temp<=3,temp.mod(3)) # This is Wrong. 

result = temp.mod(3).replace(0,3).where(mask,0)
result

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     1
9     0
10    1
11    2
12    3
13    0
dtype: int64