Python 3.x 如何计算>=dataframe列中的3个连续1值

Python 3.x 如何计算>=dataframe列中的3个连续1值,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,在一个pd.DataFrame中,我有一列a,我想计算值1的多少倍 连续发生三次或更多次 df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 输出: df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 您需要执行groupby,比如itertools.groupby,然后选择值为1的组,因为我们连续计算1。然后使用Grou

在一个
pd.DataFrame
中,我有一列
a
,我想计算值1的多少倍 连续发生三次或更多次

df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 
输出:

df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 

您需要执行groupby,比如
itertools.groupby
,然后选择值为1的组,因为我们连续计算1。然后使用
GroupBy.count
,取大于等于
3

g = df['A'].ne(df['A'].shift()).cumsum()
g = g[df['A'].eq(1)]
g.groupby(g).count().ge(3).sum()
# 2

首先按
1
对连续组进行筛选,因此获得连续的
1
组,然后按
sum
添加、比较大或相等的组,并按
True
s进行计数:

a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2
condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2] >= 3)
print (a)
2
Numpy替代方案-比较
=3和
总和的连续计数:

a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2
condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2] >= 3)
print (a)
2
太长,读不下去了 解释 下面将为每一组连续的1分配一个唯一的ID

In [3]: df['cumsum'] = (df['A'] != 1).cumsum()

In [4]: print(df)
    A  cumsum
0   0       1
1   0       2
2   1       2
3   0       3
4   1       3
5   0       4
6   0       5
7   0       6
8   0       7
9   1       7
10  1       7
11  1       7
12  1       7
13  0       8
14  1       8
15  1       8
16  1       8
。。。只要您只保留“A”等于1的行,就可以进行清理

In [5]: df = df[df['A'] == 1]

In [6]: print(df)
    A  cumsum
2   1       2
4   1       3
9   1       7
10  1       7
11  1       7
12  1       7
14  1       8
15  1       8
16  1       8
然后,您可以使用
value\u counts()
groupby()

# With value_counts()

In [7]: print(df['cumsum'].value_counts())
7    4
8    3
3    1
2    1
Name: cumsum, dtype: int64

# The amount of sets of at least 3 consecutive 1 is:
In [8]: print((df['cumsum'].value_counts() >= 3).sum())
2



# With groupby()
In [9]: list(df.groupby('cumsum'))
Out[10]: 
[(2,
     A  cumsum
  2  1       2),
 (3,
     A  cumsum
  4  1       3),
 (7,
      A  cumsum
  9   1       7
  10  1       7
  11  1       7
  12  1       7),
 (8,
      A  cumsum
  14  1       8
  15  1       8
  16  1       8)]

# The amount of sets of at least 3 consecutive 1 is:
In [10]: print(len([dataframe for _, dataframe in df.groupby('cumsum') if len(dataframe) >= 3]))
2