Python 3.x 在两个数据帧之间映射列值_Python 3.x_Pandas

Python 3.x 在两个数据帧之间映射列值

python-3.x pandas

Python 3.x 在两个数据帧之间映射列值,python-3.x,pandas,Python 3.x,Pandas,我有两个数据帧，如下所示： df1 group flag var1 AA_new AB_new B1_new B2_new 0 A 1 1 0 0 0 0 1 A 0 2 0 0 0 0 2 A 0 3 0 0 0 0 3 B

我有两个数据帧，如下所示：

df1 

    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1       1       0       0        0       0
1   A       0       2       0       0        0       0
2   A       0       3       0       0        0       0
3   B       1       7       0       0        0       0
4   B       0       8       0       0        0       0
5   B       0       9       0       0        0       0
6   B       0       10      0       0        0       0
7   B       1       15      0       0        0       0
8   B       0       20      0       0        0       0
9   B       0       30      0       0        0       0

df2

val group   AA_new  AB_new  B1_new  B2_new
0     A     40      500     0        0
2     B     0       0       700      60

我想基于列“group”在df1中映射df2，其中df1中的“flag”=1

我预期的最终数据帧：

    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1       1       40      500      0       0
1   A       0       2       0       0        0       0
2   A       0       3       0       0        0       0
3   B       1       7       0       0        700     60
4   B       0       8       0       0        0       0
5   B       0       9       0       0        0       0
6   B       0       10      0       0        0       0
7   B       1       15      0       0        700     600
8   B       0       20      0       0        0       0
9   B       0       30      0       0        0       0

下面是使用

merge

和

concat

的一种方法：

c = df1['flag'].astype(bool) #condition where flag is 1
m = df1.reset_index()  #for retaining index later
out = (pd.concat((m[c].merge(df2,on='group',suffixes=('_x',''))[m.columns],
                  m[~c])).set_index('index')
                 .sort_index().rename_axis(None))

请看下面我的尝试；条件

a=(df1['flag']==1)& (df1['group'].str.contains('A'))
b=(df1['flag']==1)& (df1['group'].str.contains('B'))

使用np.where应用条件

df1['AA_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AA_new'], 0))
df1['AB_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AB_new'], 0))
df1['B1_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B1_new'], 0))
df1['B2_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B2_new'], 0))

输出

另一种解决方案，使用：

df1['AA_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AA_new'], 0))
df1['AB_new'] = pd.DataFrame(np.where(a, df2.loc[0,'AB_new'], 0))
df1['B1_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B1_new'], 0))
df1['B2_new'] = pd.DataFrame(np.where(b, df2.loc[1,'B2_new'], 0))

   import numpy as np

   #create a variable to house columns that end with 'new'
   col = df1.columns[df1.columns.str.endswith('new')]

   #set values in col list to null if flag is 1
   df1.loc[df1.flag.eq(1),col]= np.nan

   #set index to group for both df1 and df2
   #this allows fillna to correctly fill the null values based on the index
   #use fillna to replace the null values in df1 with values from df2
   df1.set_index('group').fillna(df2.set_index('group'))

        flag    var1    AA_new  AB_new  B1_new  B2_new
group                       
A        1       1      40.0    500.0    0.0    0.0
A        0       2       0.0    0.0      0.0    0.0
A        0       3       0.0    0.0      0.0    0.0
B        1       7       0.0    0.0      700.0  60.0
B        0       8       0.0    0.0      0.0    0.0
B        0       9       0.0    0.0      0.0    0.0
B        0       10      0.0    0.0      0.0    0.0
B        1       15      0.0    0.0      700.0  60.0
B        0       20      0.0    0.0      0.0    0.0
B        0       30      0.0    0.0      0.0    0.0