Python 3.x 在两个数据帧之间映射列值
我有两个数据帧,如下所示:Python 3.x 在两个数据帧之间映射列值,python-3.x,pandas,Python 3.x,Pandas,我有两个数据帧,如下所示: df1 group flag var1 AA_new AB_new B1_new B2_new 0 A 1 1 0 0 0 0 1 A 0 2 0 0 0 0 2 A 0 3 0 0 0 0 3 B
df1
group flag var1 AA_new AB_new B1_new B2_new
0 A 1 1 0 0 0 0
1 A 0 2 0 0 0 0
2 A 0 3 0 0 0 0
3 B 1 7 0 0 0 0
4 B 0 8 0 0 0 0
5 B 0 9 0 0 0 0
6 B 0 10 0 0 0 0
7 B 1 15 0 0 0 0
8 B 0 20 0 0 0 0
9 B 0 30 0 0 0 0
df2
val group AA_new AB_new B1_new B2_new
0 A 40 500 0 0
2 B 0 0 700 60
我想基于列“group”在df1中映射df2,其中df1中的“flag”=1
我预期的最终数据帧:
group flag var1 AA_new AB_new B1_new B2_new
0 A 1 1 40 500 0 0
1 A 0 2 0 0 0 0
2 A 0 3 0 0 0 0
3 B 1 7 0 0 700 60
4 B 0 8 0 0 0 0
5 B 0 9 0 0 0 0
6 B 0 10 0 0 0 0
7 B 1 15 0 0 700 600
8 B 0 20 0 0 0 0
9 B 0 30 0 0 0 0
下面是使用
merge
和concat
的一种方法:
c = df1['flag'].astype(bool) #condition where flag is 1
m = df1.reset_index() #for retaining index later
out = (pd.concat((m[c].merge(df2,on='group',suffixes=('_x',''))[m.columns],
m[~c])).set_index('index')
.sort_index().rename_axis(None))
请看下面我的尝试; 条件
a=(df1['flag']==1)& (df1['group'].str.contains('A'))
b=(df1['flag']==1)& (df1['group'].str.contains('B'))
使用np.where应用条件
df1['AA_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AA_new'], 0))
df1['AB_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AB_new'], 0))
df1['B1_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B1_new'], 0))
df1['B2_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B2_new'], 0))
输出
另一种解决方案,使用:
df1['AA_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AA_new'], 0))
df1['AB_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AB_new'], 0))
df1['B1_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B1_new'], 0))
df1['B2_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B2_new'], 0))
import numpy as np
#create a variable to house columns that end with 'new'
col = df1.columns[df1.columns.str.endswith('new')]
#set values in col list to null if flag is 1
df1.loc[df1.flag.eq(1),col]= np.nan
#set index to group for both df1 and df2
#this allows fillna to correctly fill the null values based on the index
#use fillna to replace the null values in df1 with values from df2
df1.set_index('group').fillna(df2.set_index('group'))
flag var1 AA_new AB_new B1_new B2_new
group
A 1 1 40.0 500.0 0.0 0.0
A 0 2 0.0 0.0 0.0 0.0
A 0 3 0.0 0.0 0.0 0.0
B 1 7 0.0 0.0 700.0 60.0
B 0 8 0.0 0.0 0.0 0.0
B 0 9 0.0 0.0 0.0 0.0
B 0 10 0.0 0.0 0.0 0.0
B 1 15 0.0 0.0 700.0 60.0
B 0 20 0.0 0.0 0.0 0.0
B 0 30 0.0 0.0 0.0 0.0