Python 如何增加累计最大值_Python_Pandas

Python 如何增加累计最大值

python pandas

Python 如何增加累计最大值,python,pandas,Python,Pandas,我有一个列（price），它的值随时间而变化。从一行到另一行，该值增加、减少或保持不变。我想记录数值达到新高的次数因此，我添加了一个列currenthigh，它跟踪到目前为止的最高值。然后我添加了另一列currenthhigh\u prev，这是将currenthhigh列移动了一行。这样，我可以比较两个值：当前值和上一个值。如果currenthhigh>currenthhigh\u prev则我有一个新的高点，记录在newhighscont中我一直在尝试使用.cummax（）来实现这一点，

我有一个列（price），它的值随时间而变化。从一行到另一行，该值增加、减少或保持不变。我想记录数值达到新高的次数

因此，我添加了一个列

currenthigh

，它跟踪到目前为止的最高值。然后我添加了另一列

currenthhigh\u prev

，这是将

currenthhigh

列移动了一行。这样，我可以比较两个值：当前值和上一个值。如果

currenthhigh>currenthhigh\u prev

则我有一个新的高点，记录在

newhighscont

中

我一直在尝试使用

.cummax（）

来实现这一点，这似乎是合适的

df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = df['newhighscount'].cummax() + 1

我期待着：

              datetime      last  currenthigh  currenthigh_shift  **newhighscount** 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              **1** 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              **2** 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              **3** 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              **4** 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              **4** 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              **4** 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              **4** 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              **4** 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              **5** 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              **5** 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              5 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              5

我明白了：

              datetime      last  currenthigh  currenthigh_shift  newhighscount 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              1 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              1 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              1 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              1 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              0 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              0 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              0 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              0 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              1 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              0 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              0 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              0

基本上，

df['newhighscont'].cummax（）

似乎没有返回任何内容。

df['newhighscont']=df['last'].cummax（）.diff（）.gt（0.cumsum（））

这将计算最后一列的累积最大值，计算差值（cummax_t-cummax_{t-1}），检查差值是否大于零，并计算其为真的次数。

是否要标记唯一的

'currenthigh'

值。有很多方法可以做到这一点：

n组

排名

：将在这里工作，因为

cummax

保证单调递增

df['NewCount'] = (df.currenthigh.rank(method='dense')-1).astype(int)

map

输出：编辑：根据您的数据，下面的一个命令就足够了

原件：
你的逻辑仍然有效，但它不像其他答案那样优雅。它只需要稍微扭转一下

In [983]: df
Out[983]:
               datetime      last  currenthigh  currenthigh_shift   newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN               0
32 2019-04-02  07:57:33  389.8400       389.84             389.84               0
33 2019-04-02  07:57:33  389.8700       389.87             389.84               0
34 2019-04-02  07:57:33  389.8800       389.88             389.87               0
35 2019-04-02  07:57:33  389.9000       389.90             389.88               0
36 2019-04-02  07:57:33  389.9600       389.96             389.90               0
37 2019-04-02  07:57:35  389.9000       389.96             389.96               0
38 2019-04-02  07:57:36  389.9000       389.96             389.96               0
39 2019-04-02  08:00:00  389.3603       389.96             389.96               0
40 2019-04-02  08:00:00  388.8500       389.96             389.96               0
41 2019-04-02  08:00:00  390.0000       390.00             389.96               0
42 2019-04-02  08:00:01  389.7452       390.00             390.00               0
43 2019-04-02  08:00:01  389.4223       390.00             390.00               0
44 2019-04-02  08:00:01  389.8000       390.00             390.00               0

In [985]: df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
In [989]: df['newhighscount'] = df['newhighscount'].cummax()
In [990]: df
Out[990]:
               datetime      last  currenthigh  currenthigh_shift  newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN              0
32 2019-04-02  07:57:33  389.8400       389.84             389.84              0
33 2019-04-02  07:57:33  389.8700       389.87             389.84              1
34 2019-04-02  07:57:33  389.8800       389.88             389.87              2
35 2019-04-02  07:57:33  389.9000       389.90             389.88              3
36 2019-04-02  07:57:33  389.9600       389.96             389.90              4
37 2019-04-02  07:57:35  389.9000       389.96             389.96              4
38 2019-04-02  07:57:36  389.9000       389.96             389.96              4
39 2019-04-02  08:00:00  389.3603       389.96             389.96              4
40 2019-04-02  08:00:00  388.8500       389.96             389.96              4
41 2019-04-02  08:00:00  390.0000       390.00             389.96              5
42 2019-04-02  08:00:01  389.7452       390.00             390.00              5
43 2019-04-02  08:00:01  389.4223       390.00             390.00              5
44 2019-04-02  08:00:01  389.8000       390.00             390.00              5

美丽的。谢谢。如果我想在一段时间后才开始计数，比如说8:00？我已经有了另一个bool列，它在8:00之后为True，否则为False。您可以使用：

df.loc[df['yourboolcol']，'newhighscont2']=df.loc[df['yourboolcol']，'last'].cummax（）.diff（）.gt（0.cumsum（）

。请记住，这将在布尔列为false的行中保留

NaN

s。谢谢。它几乎可以正常工作：第一个新的高点没有被考虑在内。仍在试图找出原因..实际上

df['newhighscont2']=（df['last'].cummax（）.diff（）.gt（0）和df['yourboolcol'].shift（1））.cumsum（）

可能更优雅。这种转变是为了将第一个真值转换为第二个价格值，这样你就不会在第一个记录中获得新高。如果您希望您的第一个值（8之后）能够计算为新的高点，您可以删除移位。非常感谢您的输入。我要学习它。我是新来的，老实说，我不知道该接受哪一个答案，是你的还是约瑟姆的…如果我想在某个时间后开始计数，比如说8:00？我已经有另一个bool列，它在8:00之后为真，否则为假。（我还问了被接受答案的作者……。@fredericf那么你应该可以将RHS切片，然后重新分配。然后，您可以选择保留其他

NaN

或

.fillna（0）

。例如，类似于

df['NewCount']=df[Bool_Series].groupby（'currenthigh'，sort=False）。ngroup（）

。幸运的是，您似乎有一个多索引，所以在尝试对齐时，不会出现重复时间的问题。Ok。我也会研究这个。我可以问你如何格式化我的问题中的数据帧吗？太好了。再次感谢！谢谢安迪，这很有效。现在我必须明白：-）这很简单，所以我想你会很快明白的。如果需要指针，请告诉我。顺便说一下，根据您的数据，这个命令

df['newhighscont']=（df['currenthhigh']>df['currenthhigh_shift']）。astype（int）.cumsum（）就足够了
import pandas as pd

arr = pd.Series.unique(df.currenthigh) # Preserves order
df['NewCount'] = df.currenthigh.map(dict((arr[i], i) for i in range(len(arr))))

                         last  currenthigh  NewCount
datetime                                            
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8700       389.87         1
2019-04-02 07:57:33  389.8800       389.88         2
2019-04-02 07:57:33  389.9000       389.90         3
2019-04-02 07:57:33  389.9600       389.96         4
2019-04-02 07:57:35  389.9000       389.96         4
2019-04-02 07:57:36  389.9000       389.96         4
2019-04-02 08:00:00  389.3603       389.96         4
2019-04-02 08:00:00  388.8500       389.96         4
2019-04-02 08:00:00  390.0000       390.00         5
2019-04-02 08:00:01  389.7452       390.00         5
2019-04-02 08:00:01  389.4223       390.00         5
2019-04-02 08:00:01  389.8000       390.00         5

df['newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()

In [983]: df
Out[983]:
               datetime      last  currenthigh  currenthigh_shift   newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN               0
32 2019-04-02  07:57:33  389.8400       389.84             389.84               0
33 2019-04-02  07:57:33  389.8700       389.87             389.84               0
34 2019-04-02  07:57:33  389.8800       389.88             389.87               0
35 2019-04-02  07:57:33  389.9000       389.90             389.88               0
36 2019-04-02  07:57:33  389.9600       389.96             389.90               0
37 2019-04-02  07:57:35  389.9000       389.96             389.96               0
38 2019-04-02  07:57:36  389.9000       389.96             389.96               0
39 2019-04-02  08:00:00  389.3603       389.96             389.96               0
40 2019-04-02  08:00:00  388.8500       389.96             389.96               0
41 2019-04-02  08:00:00  390.0000       390.00             389.96               0
42 2019-04-02  08:00:01  389.7452       390.00             390.00               0
43 2019-04-02  08:00:01  389.4223       390.00             390.00               0
44 2019-04-02  08:00:01  389.8000       390.00             390.00               0

In [985]: df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
In [989]: df['newhighscount'] = df['newhighscount'].cummax()
In [990]: df
Out[990]:
               datetime      last  currenthigh  currenthigh_shift  newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN              0
32 2019-04-02  07:57:33  389.8400       389.84             389.84              0
33 2019-04-02  07:57:33  389.8700       389.87             389.84              1
34 2019-04-02  07:57:33  389.8800       389.88             389.87              2
35 2019-04-02  07:57:33  389.9000       389.90             389.88              3
36 2019-04-02  07:57:33  389.9600       389.96             389.90              4
37 2019-04-02  07:57:35  389.9000       389.96             389.96              4
38 2019-04-02  07:57:36  389.9000       389.96             389.96              4
39 2019-04-02  08:00:00  389.3603       389.96             389.96              4
40 2019-04-02  08:00:00  388.8500       389.96             389.96              4
41 2019-04-02  08:00:00  390.0000       390.00             389.96              5
42 2019-04-02  08:00:01  389.7452       390.00             390.00              5
43 2019-04-02  08:00:01  389.4223       390.00             390.00              5
44 2019-04-02  08:00:01  389.8000       390.00             390.00              5