Python 如何根据行中的多个条件和上面的数据填充dataframe中缺少的数据
我有以下数据帧Python 如何根据行中的多个条件和上面的数据填充dataframe中缺少的数据,python,pandas,dataframe,conditional-formatting,Python,Pandas,Dataframe,Conditional Formatting,我有以下数据帧 ID LineID TeamID ShiftID DateTime Production Theoretical Scrap 1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351 2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842 3 1 1
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955
114798 3 1 NULL 18/6/2018 22:00 191.2512
114888 2 1 NULL 18/6/2018 22:00 190.5125
114657 2 1 NULL 18/6/2018 22:00 414.6432
114738 1 1 NULL 18/6/2018 22:00 429.43
114885 3 1 NULL 18/6/2018 23:00 361.3246
114756 1 1 NULL 18/6/2018 23:00 409.51
我需要填写理论值为空的列,但只填写废品也为空的列
因此,条件是,当LineID为3
时,理论值总是511.2351
,当其2
时,理论值总是519.9842,当其1
时,理论值总是480.2252。但当存在废品价值时,理论值应为空
我似乎想不出这样一种正向填充方法
我尝试了以下代码,但除了这些行之外,所有其他行都变成了NaN
df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
(511.2351,519.9842,480.2252), np.nan)
我需要这样
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
当然,这不是最好的解决方案,但您可以尝试以下方法
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0
似乎不起作用是的,您是对的,第二个条件是notnull(),您还需要isna()。我更新了代码。再试一次。我希望有更好的方法来做这个lol我也想出了类似的东西
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0