Python 检测DataFrame列中的连续重复，而不进行迭代_Python_Pandas

Python 检测DataFrame列中的连续重复，而不进行迭代

python pandas

Python 检测DataFrame列中的连续重复，而不进行迭代,python,pandas,Python,Pandas,因此根据，最好不要迭代数据帧中的行。但是，如果不使用for循环，我不知道如何解决我的问题我需要检测特定列中的任何连续重复（三次或更多次）。因此，例如，如果值0出现在特定ID的三个连续行中，我想知道该ID ID Value 1 0 1 0.5 1 0 <--- I need this ID, because there are three consecutive 0s. 1 0 1 0 1 0.2 2

因此根据，最好不要迭代数据帧中的行。但是，如果不使用for循环，我不知道如何解决我的问题

我需要检测特定列中的任何连续重复（三次或更多次）。因此，例如，如果值0出现在特定ID的三个连续行中，我想知道该ID

ID     Value
1       0
1       0.5
1       0   <--- I need this ID, because there are three consecutive 0s.
1       0
1       0
1       0.2
2       0.1
2       0   <--- Not this one! It only appears twice in a row for this ID.
2       0
3       0
3       0

ID值
1       0
1       0.5
10您可以执行以下操作：
f = lambda x:np.diff(np.r_[0,np.flatnonzero(np.diff(x))+1,x.size])[0]
df[(df[['ID','Value']].ne(df[['ID','Value']].shift()).cumsum()
          .groupby(['ID','Value'])['Value'].transform(f).ge(3))]


不是最好的方法，但是：
>>> df2 = df.groupby('ID').apply(lambda x: [i for i in (x['Value'] != x['Value'].shift()).cumsum().tolist() if (x['Value'] != x['Value'].shift()).cumsum().tolist().count(i) >= 3]).reset_index()
>>> df2.loc[df2.astype(str)[0] != '[]', 'ID'].tolist()
[1]
>>> 

第一个断言是对ID进行排序
步骤：
1-对数据帧进行排序
2-将索引列放入新列以测试连续性
3-基于元组（id、值）将数据帧拆分为多个数据帧
4-循环所有数据帧（不消耗资源）
5-匹配条件并获取ID
import pandas 

df = pandas.DataFrame({'id': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3], 
                       'value': [0.5, 0, 0, 0, 0.1, 0, 0, 0.3, 0, 0]}
                        )


df.sort_values(by=['id']).reset_index(drop=True)
df['cons'] = df.index
CONST_VALUE = 0

d = dict(tuple(df.groupby(['id', 'value'])))

def is_consecutive(list_):
    setl = set(list_)
    return len(list_) == len(setl) and setl == set(range(min(list_), max(list_)+1))

for k, v in d.items(): 
    if (k[1]==CONST_VALUE and len(v)>=3 and is_consecutive(v['cons'].to_list())): 
        print('wanted ID : {}'.format(k[0]))


输出：
wanted ID : 1

这不是一个小问题，需要双重groupby，类似于@anky91的解决方案：
# a little different df
df = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3],
 'Value': [0.0, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.5, 0.2, 0.1, 0.0, 0.0, 0.0]})

# we want to know where the differences in Value happen
s = df.groupby('ID').Value.transform(lambda x: x.ne(x.shift(-1)) )

# groupby ID and these differences block
# cumsum helps isolate these blocks
idx = s.groupby([df['ID'], s.cumsum()]).cumcount().eq(2)

给出：
0     False
1     False
2     False
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
dtype: bool

您可以提取所需的ID
df.loc[idx, 'ID'].unique()

对于任何值
，这将返回3个相同的连续值。。。。不仅仅是连续的0
's@ChrisA是否仅针对0？我认为0是一个基于“我需要检测任何连续重复”的示例：）@anky_91 yeh，可能是。我可能误解了Dyeah，它看起来像是由两个系列组成的groupby。@Zarif将值替换为原始列名。在本例中，其值groupby
不会改变序列的顺序，sort\u值可能会改变序列的顺序。
df.loc[idx, 'ID'].unique()