Python:在多个条件下合并数据帧
我希望在多种条件下合并通过sql获取的数据帧 df1:第一个df包含客户ID、集群ID和客户区域ID。 第二个df包含投诉ID、注册号。 df1和df2如下所示: df1 df2: 我希望在以下条件下合并这两个数据帧:Python:在多个条件下合并数据帧,python,pandas,numpy,Python,Pandas,Numpy,我希望在多种条件下合并通过sql获取的数据帧 df1:第一个df包含客户ID、集群ID和客户区域ID。 第二个df包含投诉ID、注册号。 df1和df2如下所示: df1 df2: 我希望在以下条件下合并这两个数据帧: if(Complain ID == Customer ID): Merge on Customer ID Elif(Complain ID == Cluster ID): Merge on Customer ID Elif (Complain ID == Cust
if(Complain ID == Customer ID):
Merge on Customer ID
Elif(Complain ID == Cluster ID):
Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
Merge on Customer ID
Else:
Merge empty row.
最终结果应如下所示:
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
请帮忙 试试这个……使用熊猫:融化、合并和浓缩
更新
通过使用numpy的intersect1d,我个人最喜欢这种方法,而不是以前的方法
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
Target
Out[39]:
Customer ID Cluster ID Customer Zone ID Complain ID RegistrationNumber \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100.0
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340.0
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100.0
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X 32100.0
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X 32100.0
5 CUS2001.A CUS2001.X CUS2000 0 0.0
Status
0 open
1 open
2 open
3 open
4 open
5 0
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN