如果多列中的值固定在一个值上,请使用Python使用NaN进行更改
我应该在发帖时问以下问题。如下所示,my如果多列中的值固定在一个值上,请使用Python使用NaN进行更改,python,pandas,Python,Pandas,我应该在发帖时问以下问题。如下所示,mydf包含一些相同的连续值,即1、2和3 Date London Paris Dubai Tokyo 18/07/2017 22:35 1 2406 4348 70715 18/07/2017 22:40 1 4756 3744 3 18/07/2017 22:45 1 3988 2915 3 18/07/2017 22:50
df
包含一些相同的连续值,即1、2和3
Date London Paris Dubai Tokyo
18/07/2017 22:35 1 2406 4348 70715
18/07/2017 22:40 1 4756 3744 3
18/07/2017 22:45 1 3988 2915 3
18/07/2017 22:50 2280 3058 2120 3
18/07/2017 22:55 2 1 1939 3
18/07/2017 23:00 2 1 2256 3
18/07/2017 23:05 2121 1 2640 2025
18/07/2017 23:10 3367 2 2202 1916
18/07/2017 23:15 3247 3 1 2
18/07/2017 23:20 2491 3 1 2
18/07/2017 23:25 2010 3 1 1560
18/07/2017 23:30 1899 3 1366 1355
18/07/2017 23:35 1992 2265 1236 1
18/07/2017 23:40 2196 4407 2 1
18/07/2017 23:45 1961 3848 2 1
18/07/2017 23:50 3 2880 2809 4565
18/07/2017 23:55 3 2143 2397 3725
19/07/2017 00:00 3 1981 3 2921
19/07/2017 00:05 3 2227 3 2131
19/07/2017 00:10 1366 2526 3 1990
我想在这些“死值”至少停留在3行时检测它们,然后用NaN将它们全部更改,因为我想稍后消除它们。有了来自的代码,我可以为伦敦专栏做这件事
g = df.London.diff().fillna(0).ne(0).cumsum()
m = df.groupby(g).London.transform('size').ge(3)
df.loc[m,'London'] = np.nan
df.assign(grouper=g, mask=m, result=df.London)
但是,我想对其他人(大约250列)也这样做
以下是预期的输出,其中所有1和3都被转换为NaN,因为它们的值至少被固定在3个连续行中
Date London Paris Dubai Tokyo
18/07/2017 22:35 NaN 2406 4348 70715
18/07/2017 22:40 NaN 4756 3744 NaN
18/07/2017 22:45 NaN 3988 2915 NaN
18/07/2017 22:50 2280 3058 2120 NaN
18/07/2017 22:55 2 NaN 1939 NaN
18/07/2017 23:00 2 NaN 2256 NaN
18/07/2017 23:05 2121 NaN 2640 2025
18/07/2017 23:10 3367 2 2202 1916
18/07/2017 23:15 3247 NaN NaN 2
18/07/2017 23:20 2491 NaN NaN 2
18/07/2017 23:25 2010 NaN NaN 1560
18/07/2017 23:30 1899 NaN 1366 1355
18/07/2017 23:35 1992 2265 1236 NaN
18/07/2017 23:40 2196 4407 2 NaN
18/07/2017 23:45 1961 3848 2 NaN
18/07/2017 23:50 NaN 2880 2809 4565
18/07/2017 23:55 NaN 2143 2397 3725
19/07/2017 00:00 NaN 1981 NaN 2921
19/07/2017 00:05 NaN 2227 NaN 2131
19/07/2017 00:10 1366 2526 NaN 1990
如果您的代码运行良好且快速-只需跨列迭代即可:
for col in df.columns:
g = df.[col].diff().fillna(0).ne(0).cumsum()
# and so on...
shift
使用np.logical\u和.reduce
和np.logical\u或.reduce
创建掩码(或双精度)
输出:df
1号和3号仍然有效there@k.koen把它分配回来<例如,code>df=df.where(df.where(m.to_numpy()).bfill(limit=Nmin-1).isnull())Ah通过将其分配回df,它现在可以工作了。谢谢
import numpy as np
import pandas as pd
Nmin = 3 # At least 2
m = pd.DataFrame(np.logical_and.reduce([(df == df.shift(i)).to_numpy() for i in range(1, Nmin)]))
df = df.where(df.where(m.to_numpy()).bfill(limit=Nmin-1).isnull())
#df = df.mask(np.logical_or.reduce([m.shift(-i).fillna(False).to_numpy() for i in range(Nmin)]))
Date London Paris Dubai Tokyo
0 18/07/2017-22:35 NaN 2406.0 4348.0 70715.0
1 18/07/2017-22:40 NaN 4756.0 3744.0 NaN
2 18/07/2017-22:45 NaN 3988.0 2915.0 NaN
3 18/07/2017-22:50 2280.0 3058.0 2120.0 NaN
4 18/07/2017-22:55 2.0 NaN 1939.0 NaN
5 18/07/2017-23:00 2.0 NaN 2256.0 NaN
6 18/07/2017-23:05 2121.0 NaN 2640.0 2025.0
7 18/07/2017-23:10 3367.0 2.0 2202.0 1916.0
8 18/07/2017-23:15 3247.0 NaN NaN 2.0
9 18/07/2017-23:20 2491.0 NaN NaN 2.0
10 18/07/2017-23:25 2010.0 NaN NaN 1560.0
11 18/07/2017-23:30 1899.0 NaN 1366.0 1355.0
12 18/07/2017-23:35 1992.0 2265.0 1236.0 NaN
13 18/07/2017-23:40 2196.0 4407.0 2.0 NaN
14 18/07/2017-23:45 1961.0 3848.0 2.0 NaN
15 18/07/2017-23:50 NaN 2880.0 2809.0 4565.0
16 18/07/2017-23:55 NaN 2143.0 2397.0 3725.0
17 19/07/2017-00:00 NaN 1981.0 NaN 2921.0
18 19/07/2017-00:05 NaN 2227.0 NaN 2131.0
19 19/07/2017-00:10 1366.0 2526.0 NaN 1990.0