Python 根据连续值的数量进行分类_Python_Pandas

Python 根据连续值的数量进行分类

python pandas

Python 根据连续值的数量进行分类,python,pandas,Python,Pandas,我有一个带有1和0的数据帧列，如下所示： df['working'] = 1 1 0 0 0 1 1 0 0 1 表示机器何时工作（1）或何时停止（0）。我需要根据其长度对这些停止进行分类，即如果连续0少于或等于n则将所有停止更改为短停止（2），如果多于n，则更改为长停止（3）。当应用于带有n=2的示例时，预期结果应如下所示： df[['working', 'result']]= 1 1 1 1 0 3 0 3 0 3 1 1 1 1 0 2

我有一个带有1和0的数据帧列，如下所示：

df['working'] = 
1
1
0
0
0
1
1
0
0
1

表示机器何时工作（1）或何时停止（0）。我需要根据其长度对这些停止进行分类，即如果连续0少于或等于

则将所有停止更改为短停止（2），如果多于

，则更改为长停止（3）。当应用于带有

n=2的示例时，预期结果应如下所示：
df[['working', 'result']]=
1    1
1    1
0    3
0    3
0    3
1    1
1    1
0    2
0    2
1    1

当然这是一个例子，我的df有超过1M行
我试着循环通过它，但它真的很慢，也使用。但我无法将其转化为我的问题
有人能帮忙吗？。非常感谢。这里有一个方法：
# Counter for each gruop where there is a change
m = df.working.ne(df.working.shift()).cumsum()
# mask where working is 0
eq0 = df.working.eq(0)
# Get a count of consecutive 0s
count = df[eq0].groupby(m[eq0]).transform('count')
# replace 0s accordingly
df.loc[eq0, 'result'] = np.where(count > 2, 3, 2).ravel()
# fill the remaining values with 1
df['result'] = df.result.fillna(1)

print(df)

    working  result
0        1     1.0
1        1     1.0
2        0     3.0
3        0     3.0
4        0     3.0
5        1     1.0
6        1     1.0
7        0     2.0
8        0     2.0
9        1     1.0

这里有一种方法：
# Counter for each gruop where there is a change
m = df.working.ne(df.working.shift()).cumsum()
# mask where working is 0
eq0 = df.working.eq(0)
# Get a count of consecutive 0s
count = df[eq0].groupby(m[eq0]).transform('count')
# replace 0s accordingly
df.loc[eq0, 'result'] = np.where(count > 2, 3, 2).ravel()
# fill the remaining values with 1
df['result'] = df.result.fillna(1)

print(df)

    working  result
0        1     1.0
1        1     1.0
2        0     3.0
3        0     3.0
4        0     3.0
5        1     1.0
6        1     1.0
7        0     2.0
8        0     2.0
9        1     1.0

我希望使用with可以提高性能：
n = 2
#compare 0 values
m = df['working'].eq(0)
#created groups only by mask
s = df['working'].cumsum()[m]
#counts only 0 groups
out = s.map(s.value_counts())
#set new values by mask
df['result'] = 1
df.loc[m, 'result'] = np.where(out > n, 3, 2)
print (df)
   working  result
0        1       1
1        1       1
2        0       3
3        0       3
4        0       3
5        1       1
6        1       1
7        0       2
8        0       2
9        1       1

我希望使用with可以提高性能：
n = 2
#compare 0 values
m = df['working'].eq(0)
#created groups only by mask
s = df['working'].cumsum()[m]
#counts only 0 groups
out = s.map(s.value_counts())
#set new values by mask
df['result'] = 1
df.loc[m, 'result'] = np.where(out > n, 3, 2)
print (df)
   working  result
0        1       1
1        1       1
2        0       3
3        0       3
4        0       3
5        1       1
6        1       1
7        0       2
8        0       2
9        1       1

如果n=3，预期输出是什么？那么它应该是：1 2 1 2 2 1
如果n=3，预期输出是什么？那么它应该是：1 2 1 2 1
@Gamopo-ya，是否可以使用真实数据进行测试计时？因为我认为这个解决方案是可行的faster@Gamopo-是的，可以用真实数据进行测试计时吗？因为我认为这个解决方案更快