Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:识别列中的值更改,并对特定值的每组进行计数/标记_Python_Pandas - Fatal编程技术网

Python:识别列中的值更改,并对特定值的每组进行计数/标记

Python:识别列中的值更改,并对特定值的每组进行计数/标记,python,pandas,Python,Pandas,我有一个csv,我正在加载到一个数据框中,我需要识别列中每次值的变化,用相似的值标记每组相邻行,并让计数忽略我不关心的值 使用下面的代码,我可以成功地识别和标记集群,但它无法在我想要的值1中包含仅计数因子 import pandas as pd import numpy as np import os InputPath = r'C:\Users\YYYY\Desktop\File1.csv' df=pd.read_csv(InputPath) df[Result] = ((df['Mark'

我有一个csv,我正在加载到一个数据框中,我需要识别列中每次值的变化,用相似的值标记每组相邻行,并让计数忽略我不关心的值

使用下面的代码,我可以成功地识别和标记集群,但它无法在我想要的值1中包含仅计数因子

import pandas as pd
import numpy as np
import os

InputPath = r'C:\Users\YYYY\Desktop\File1.csv'
df=pd.read_csv(InputPath)
df[Result] = ((df['Mark'] != df['Mark'].shift(1)).cumsum()).where(df['Mark'] == 1)
数据: 数据={Se

ries': ['A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B'],
        'Time': [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
         'Mark': [0,0,1,1,0,0,1,1,0,0,0,0,0,1,0,1,1,0]        }

df = pd.DataFrame (data, columns = ['Series','Time','Mark'])
df

(Desire 2
此外,我如何让它重新启动每个系列的计数为1,以确保随着时间的增加,计数仍然随着每个新集群的增加而增加

然后看起来是这样的:

   Series Time  Mark  Result  Desire1  Desire2
0       A    1     0     NaN      NaN      NaN
1       A    2     0     NaN      NaN      NaN
2       A    3     1     2.0      1.0      1.0
3       A    4     1     2.0      1.0      1.0
4       A    5     0     NaN      NaN      NaN
5       A    6     0     NaN      NaN      NaN
6       A    7     1     4.0      2.0      2.0
7       A    8     1     4.0      2.0      2.0
8       A    9     0     NaN      NaN      NaN
9       B    1     0     NaN      NaN      NaN
10      B    2     0     NaN      NaN      NaN
11      B    3     0     NaN      NaN      NaN
12      B    4     0     NaN      NaN      NaN
13      B    5     1     6.0      3.0      1.0
14      B    6     0     NaN      NaN      NaN
15      B    7     1     8.0      4.0      2.0
16      B    8     1     8.0      4.0      2.0
17      B    9     0     NaN      NaN      NaN

Desire1只是结果除以2。因为它只计算每秒一组。Desire2你可以通过分别为每个系列的每个数据帧应用Desire1的公式来达到。这通常是由.groupby方法完成的。你能给出生成迷你数据帧的代码吗?@Gwang JinKim Desire1的要点!我还在问题中包括了数据框架的数据。关于你的第二条评论,我该怎么做?我对这一切都不熟悉,正在努力思考我应该在哪里包括我创建的.Groupbyah。但我永远不明白为什么在这里提问的人不试图帮助那些想帮助他们的人。但是,嘿,你做到了。那就是nks.欢迎,@NightLearner.我已经简化了最后一个表达式。我刚刚尝试了你发布的新表达式,我得到了以下错误“DataFrame”对象没有“to_numpy”属性。你有什么想法吗?你在使用python 2.x吗?-不管怎样,在df.groupby“Series”之后。applyget_desire1你没有DataFrame对象。没有,我使用的是Python3,没有意识到这可能是dirdf.groupby'Series.applyget_desire1>的一个问题,它将显示所有可用的属性和方法。
   Series Time  Mark  Result  Desire1  Desire2
0       A    1     0     NaN      NaN      NaN
1       A    2     0     NaN      NaN      NaN
2       A    3     1     2.0      1.0      1.0
3       A    4     1     2.0      1.0      1.0
4       A    5     0     NaN      NaN      NaN
5       A    6     0     NaN      NaN      NaN
6       A    7     1     4.0      2.0      2.0
7       A    8     1     4.0      2.0      2.0
8       A    9     0     NaN      NaN      NaN
9       B    1     0     NaN      NaN      NaN
10      B    2     0     NaN      NaN      NaN
11      B    3     0     NaN      NaN      NaN
12      B    4     0     NaN      NaN      NaN
13      B    5     1     6.0      3.0      1.0
14      B    6     0     NaN      NaN      NaN
15      B    7     1     8.0      4.0      2.0
16      B    8     1     8.0      4.0      2.0
17      B    9     0     NaN      NaN      NaN