Python 3.x 如何计算>=dataframe列中的3个连续1值
在一个Python 3.x 如何计算>=dataframe列中的3个连续1值,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,在一个pd.DataFrame中,我有一列a,我想计算值1的多少倍 连续发生三次或更多次 df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]}) 输出: df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]}) 您需要执行groupby,比如itertools.groupby,然后选择值为1的组,因为我们连续计算1。然后使用Grou
pd.DataFrame
中,我有一列a
,我想计算值1的多少倍
连续发生三次或更多次
df=pd.DataFrame({'A':[0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,1,1]})
输出:
df1=pd.DataFrame({'value_one_count_for three or more than three times':[2]})
您需要执行groupby,比如
itertools.groupby
,然后选择值为1的组,因为我们连续计算1。然后使用GroupBy.count
,取大于等于3
g = df['A'].ne(df['A'].shift()).cumsum()
g = g[df['A'].eq(1)]
g.groupby(g).count().ge(3).sum()
# 2
首先按
1
对连续组进行筛选,因此获得连续的1
组,然后按sum
添加、比较大或相等的组,并按True
s进行计数:
a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2
condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
condition[:-1] != condition[1:],
[True])))[0])[::2] >= 3)
print (a)
2
Numpy替代方案-比较=3和总和的连续计数:
a = df['A'].ne(df['A'].shift()).cumsum()[df['A'].eq(1)].value_counts().ge(3).sum()
print (a)
2
condition = df.A.eq(1).to_numpy()
#https://stackoverflow.com/a/24343375
a = np.sum(np.diff(np.where(np.concatenate(([condition[0]],
condition[:-1] != condition[1:],
[True])))[0])[::2] >= 3)
print (a)
2
太长,读不下去了
解释
下面将为每一组连续的1分配一个唯一的ID
In [3]: df['cumsum'] = (df['A'] != 1).cumsum()
In [4]: print(df)
A cumsum
0 0 1
1 0 2
2 1 2
3 0 3
4 1 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 7
10 1 7
11 1 7
12 1 7
13 0 8
14 1 8
15 1 8
16 1 8
。。。只要您只保留“A”等于1的行,就可以进行清理
In [5]: df = df[df['A'] == 1]
In [6]: print(df)
A cumsum
2 1 2
4 1 3
9 1 7
10 1 7
11 1 7
12 1 7
14 1 8
15 1 8
16 1 8
然后,您可以使用value\u counts()
或groupby()
# With value_counts()
In [7]: print(df['cumsum'].value_counts())
7 4
8 3
3 1
2 1
Name: cumsum, dtype: int64
# The amount of sets of at least 3 consecutive 1 is:
In [8]: print((df['cumsum'].value_counts() >= 3).sum())
2
# With groupby()
In [9]: list(df.groupby('cumsum'))
Out[10]:
[(2,
A cumsum
2 1 2),
(3,
A cumsum
4 1 3),
(7,
A cumsum
9 1 7
10 1 7
11 1 7
12 1 7),
(8,
A cumsum
14 1 8
15 1 8
16 1 8)]
# The amount of sets of at least 3 consecutive 1 is:
In [10]: print(len([dataframe for _, dataframe in df.groupby('cumsum') if len(dataframe) >= 3]))
2