Python 3.x 如何计算>=dataframe列中的3个连续1值_Python 3.x_Pandas_Numpy_Pandas Groupby

Python 3.x 如何计算>=dataframe列中的3个连续1值

python-3.x pandas numpy

Python 3.x 如何计算>=dataframe列中的3个连续1值,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,在一个pd.DataFrame中，我有一列a，我想计算值1的多少倍连续发生三次或更多次 df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 输出： df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 您需要执行groupby，比如itertools.groupby，然后选择值为1的组，因为我们连续计算1。然后使用Grou

在一个

pd.DataFrame

中，我有一列

，我想计算值1的多少倍连续发生三次或更多次

df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]})

输出：

df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]})

您需要执行groupby，比如

itertools.groupby

，然后选择值为1的组，因为我们连续计算1。然后使用

GroupBy.count

，取大于等于

g = df['A'].ne(df['A'].shift()).cumsum()
g = g[df['A'].eq(1)]
g.groupby(g).count().ge(3).sum()
# 2

首先按

对连续组进行筛选，因此获得连续的

组，然后按

sum

添加、比较大或相等的组，并按

True

s进行计数：

a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2

condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2] >= 3)
print (a)
2

Numpy替代方案-比较

=3和总和的连续计数：
a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2

condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2] >= 3)
print (a)
2

太长，读不下去了
解释
下面将为每一组连续的1分配一个唯一的ID
In [3]: df['cumsum'] = (df['A'] != 1).cumsum()

In [4]: print(df)
    A  cumsum
0   0       1
1   0       2
2   1       2
3   0       3
4   1       3
5   0       4
6   0       5
7   0       6
8   0       7
9   1       7
10  1       7
11  1       7
12  1       7
13  0       8
14  1       8
15  1       8
16  1       8

。。。只要您只保留“A”等于1的行，就可以进行清理
In [5]: df = df[df['A'] == 1]

In [6]: print(df)
    A  cumsum
2   1       2
4   1       3
9   1       7
10  1       7
11  1       7
12  1       7
14  1       8
15  1       8
16  1       8

然后，您可以使用value\u counts（）
或groupby（）

# With value_counts()

In [7]: print(df['cumsum'].value_counts())
7    4
8    3
3    1
2    1
Name: cumsum, dtype: int64

# The amount of sets of at least 3 consecutive 1 is:
In [8]: print((df['cumsum'].value_counts() >= 3).sum())
2



# With groupby()
In [9]: list(df.groupby('cumsum'))
Out[10]: 
[(2,
     A  cumsum
  2  1       2),
 (3,
     A  cumsum
  4  1       3),
 (7,
      A  cumsum
  9   1       7
  10  1       7
  11  1       7
  12  1       7),
 (8,
      A  cumsum
  14  1       8
  15  1       8
  16  1       8)]

# The amount of sets of at least 3 consecutive 1 is:
In [10]: print(len([dataframe for _, dataframe in df.groupby('cumsum') if len(dataframe) >= 3]))
2