Python 用所有行的中位数替换NaN值,但只选择一些行?
这是我之前的一个补充,但我知道需要添加一个特定行的选择来应用更改后的值Python 用所有行的中位数替换NaN值,但只选择一些行?,python,pandas,dataframe,Python,Pandas,Dataframe,这是我之前的一个补充,但我知道需要添加一个特定行的选择来应用更改后的值 np.random.seed(0) rng = pd.date_range('2020-09-24', periods=20, freq='0.2H') df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)), 'Dist' :np.random.randn(len(rng)), 'Variant' : ["Red", "
np.random.seed(0)
rng = pd.date_range('2020-09-24', periods=20, freq='0.2H')
df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)), 'Dist' :np.random.randn(len(rng)), 'Variant' : ["Red", "Blue", "Blue", "Yellow","Blue", "Blue", "Yellow", "Blue", "Yellow","Blue", "Red", "Red", "Red", "Red","Blue", "Blue", "Yellow","Red", "Yellow", "Yellow"]})
df.Dist[df.Dist<=-0.6] = np.nan
df.Val[df.Val<=-0.5] = np.nan
但现在我不知道如何使每小时的中位数在列中的所有值中都计算出来,但只用于在变量列中用红色填充值?
同样,这是整个Dist和Val列独立的中间值。
这将使NaN值保留在黄色和蓝色的行中。从红色变量中获取索引,然后使用变量列以及在
groupby
中计算中值,然后仅更新目标索引
cols = ['Val','Dist']
idx_red = df.Variant.eq('Red')
df.loc[idx_red, cols] = df.loc[idx_red, cols].fillna(df.groupby([df.Date.dt.floor('H')])[cols].transform('median')[idx_red])
输出:
Date Val Dist Variant
0 2020-09-24 00:00:00 1.764052 NaN Red
1 2020-09-24 00:12:00 0.400157 0.653619 Blue
2 2020-09-24 00:24:00 0.978738 0.864436 Blue
3 2020-09-24 00:36:00 2.240893 NaN Yellow
4 2020-09-24 00:48:00 1.867558 2.269755 Blue
5 2020-09-24 01:00:00 NaN NaN Blue
6 2020-09-24 01:12:00 0.950088 0.045759 Yellow
7 2020-09-24 01:24:00 -0.151357 -0.187184 Blue
8 2020-09-24 01:36:00 -0.103219 1.532779 Yellow
9 2020-09-24 01:48:00 0.410599 1.469359 Blue
10 2020-09-24 02:00:00 0.144044 0.154947 Red
11 2020-09-24 02:12:00 1.454274 0.378163 Red
12 2020-09-24 02:24:00 0.761038 0.266555 Red
13 2020-09-24 02:36:00 0.121675 0.266555 Red
14 2020-09-24 02:48:00 0.443863 -0.347912 Blue
15 2020-09-24 03:00:00 0.333674 0.156349 Blue
16 2020-09-24 03:12:00 1.494079 1.230291 Yellow
17 2020-09-24 03:24:00 -0.205158 1.202380 Red
18 2020-09-24 03:36:00 0.313068 -0.387327 Yellow
19 2020-09-24 03:48:00 NaN -0.302303 Yellow
注意:请注意,除了“红色”之外的其他变体没有更新,只有NAs的红色变体也没有更新。这非常感谢,特别是我不知道的
.eq
。
Date Val Dist Variant
0 2020-09-24 00:00:00 1.764052 NaN Red
1 2020-09-24 00:12:00 0.400157 0.653619 Blue
2 2020-09-24 00:24:00 0.978738 0.864436 Blue
3 2020-09-24 00:36:00 2.240893 NaN Yellow
4 2020-09-24 00:48:00 1.867558 2.269755 Blue
5 2020-09-24 01:00:00 NaN NaN Blue
6 2020-09-24 01:12:00 0.950088 0.045759 Yellow
7 2020-09-24 01:24:00 -0.151357 -0.187184 Blue
8 2020-09-24 01:36:00 -0.103219 1.532779 Yellow
9 2020-09-24 01:48:00 0.410599 1.469359 Blue
10 2020-09-24 02:00:00 0.144044 0.154947 Red
11 2020-09-24 02:12:00 1.454274 0.378163 Red
12 2020-09-24 02:24:00 0.761038 0.266555 Red
13 2020-09-24 02:36:00 0.121675 0.266555 Red
14 2020-09-24 02:48:00 0.443863 -0.347912 Blue
15 2020-09-24 03:00:00 0.333674 0.156349 Blue
16 2020-09-24 03:12:00 1.494079 1.230291 Yellow
17 2020-09-24 03:24:00 -0.205158 1.202380 Red
18 2020-09-24 03:36:00 0.313068 -0.387327 Yellow
19 2020-09-24 03:48:00 NaN -0.302303 Yellow