Pandas pd.DataFrame.merge未找到匹配项
我有三个pd数据帧: 我需要将df1和df2堆叠在一起,然后基于多个变量(var1、var2、var5)使用df3左连接将它们连接起来 于是我写道:Pandas pd.DataFrame.merge未找到匹配项,pandas,merge,left-join,Pandas,Merge,Left Join,我有三个pd数据帧: 我需要将df1和df2堆叠在一起,然后基于多个变量(var1、var2、var5)使用df3左连接将它们连接起来 于是我写道: pd.concat([df1, df2], axis = 0, sort = False).merge(df3, how = 'left', on = ['var1', 'var2', 'var5']) 但是它没有找到所有匹配的行。将类型更改为外部联接我们可以观察到,例如,有两行具有相同的var1、var2和var3值-第11行和第28行,但它们
pd.concat([df1, df2], axis = 0, sort = False).merge(df3, how = 'left', on = ['var1', 'var2', 'var5'])
但是它没有找到所有匹配的行。将类型更改为外部联接我们可以观察到,例如,有两行具有相同的var1、var2和var3值-第11行和第28行,但它们尚未联接:
pd.concat([df1, df2], axis = 0, sort = False).merge(df3, how = 'outer', on = ['var1', 'var2', 'var5'])
我正在努力寻找这种行为的原因。我认为连接列中的数据类型可能不同,但不,它们是相同的。我对熊猫比较陌生,所以也许我错过了一些明显的东西?这种意外行为的原因是什么?当我在我的计算机上运行您的代码,然后使用df.dtypes获取类型时,df1中var5列的dtype是object,而在df2和df3中它是float64。concat运行良好,在concat之后,数据类型为object,但是当我尝试运行merge outer或left时,我得到一个ValueError:
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
我建议你再检查一遍我知道你已经检查过了。如果它们在你的计算机上真的是一样的,我不知道发生了什么。df1=pd.DataFrame{'var1':{0:2210,1:2210,2:2210,3:2210,4:2210,5:2210,6:2210,7:2210,8:2210,9:2210,10:2210,11:2210,12:2210,13:2210,14:2210,15:2210,16:2210,17:2210,18:2210,19:2210,20:2210,21:2210,'var2''0:1,7:2,8:1,9:2,10:1,10:1,11:2,12:1,13:2,13:2,14:1,1:1,1:1,1:1,1:1,1:1,1:1,1:1,1:1,1,1:1,1,1:1,1,1:1,1,1,1,1,1:1,1,1,5:1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1“:{0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0,18:0,19:0,20:0,21:0},“var5”:{0: '121160', 1: '20066', 2: ' 58621', 3: ' 201084', 4: ' 100180', 5: ' 74230', 6: ' 27789', 7: ' 66975', 8: ' 57410', 9: ' 49413', 10: ' 57112', 11: ' 19188', 12: ' 61366', 13: ' 27341', 14: ' 59859', 15: ' 173954', 16: ' 205651', 17: ' 54861', 18: ' 165809', 19: ' 60252', 20: ' 182156', 21: ' 82403'}}
df2=pd.DataFrame{'var1':{349176:2210,349225:2210,349913:2210,350247:2210,350342:2210,350518:2210},'var2':{349176:2349225:1349913:11350247:23503422:1350518:2},'var5':{349176:58786.0,349225:37572.0,349913:103955.0,350247:1910,350518:753''{349176:19349225:22349913:56350247:75350342:80350518:95},'var4':{349176:81349225:52349913:42350247:0350342:50350518:17}
3.9912:22110,350246:10,350246:22110,350246:22110,35010,350341:22110,3501:22110,350341:22110,350517:22110,350517:10,350517:22110,350517:22110,350517:10,350517:22110,350517:22110,350517:22110,350517:22110,3505110,350517:22110,350517:22110,35010,35010,3505210,35010,3505210,35010,3505,3505,35010,35010,3505210,3505,350521:22110,3505,35010,3505210,3505,35010,35010,3505,3505,35010,35010,35010,35010,350521:22110,35010,3505,3505,350521:22110,3505,35010},'var6':{349175:19349224:22349912:56350246:75350341:80350517:95350521:95},'var7':{349175:81349224:52349912:42350246:0350351:50350517:17350521:40}
pd.concat[df1,df2],轴=0.d类型
导致
var1 int64
var2 int64
var3 int64
var4 int64
var5 object
dtype: object
正如您在concat之后看到的,var5是一个对象。如果您在此时合并,您将不会得到任何结果,因为df3中的var5是一个浮点
以下是我的建议:
df1['var5']=df1['var5'].astypefloat
df2['var5']=df2['var5'].astypefloat
df3['var5']=df3['var5'].astypefloat
pd.concat[df1,df2],axis=0.mergedf3,how='left',on=['var1','var2','var5']
这将产生以下数据帧:
var1 var2 var3 var4 var5 var6 var7
0 2210 1 0 0 121160.0 NaN NaN
1 2210 2 0 0 20066.0 NaN NaN
2 2210 1 0 0 58621.0 NaN NaN
3 2210 2 0 0 201084.0 NaN NaN
4 2210 1 0 0 100180.0 NaN NaN
5 2210 2 0 0 74230.0 NaN NaN
6 2210 1 0 0 27789.0 NaN NaN
7 2210 2 0 0 66975.0 NaN NaN
8 2210 1 0 0 57410.0 NaN NaN
9 2210 2 0 0 49413.0 NaN NaN
10 2210 1 0 0 57112.0 NaN NaN
11 2210 2 0 0 19188.0 19.0 8.0
12 2210 1 0 0 61366.0 95.0 40.0
13 2210 2 0 0 27341.0 75.0 0.0
14 2210 1 0 0 59859.0 56.0 42.0
15 2210 2 0 0 173954.0 NaN NaN
16 2210 1 0 0 205651.0 22.0 52.0
17 2210 2 0 0 54861.0 NaN NaN
18 2210 1 0 0 165809.0 80.0 50.0
19 2210 2 0 0 60252.0 NaN NaN
20 2210 1 0 0 182156.0 NaN NaN
21 2210 2 0 0 82403.0 NaN NaN
22 2210 2 19 8 58786.0 NaN NaN
23 2210 1 22 52 37572.0 NaN NaN
24 2210 1 56 42 103955.0 NaN NaN
25 2210 2 75 0 19197.0 95.0 17.0
26 2210 1 80 50 14664.0 NaN NaN
27 2210 2 95 17 75773.0 NaN NaN
我再次检查,var5在所有初始数据帧中都是一个对象,在pd.concat.Hmm生成的数据帧中也是一个对象。当您将数据帧复制到stackoverflow中时,可能发生了一些变化……它看起来确实像df2和df3中的一个浮点数:“var5”:{349176:58786.0,,“var5”:{349175:19188.0,。你能验证这与你拥有的是一样的吗?var5的本质实际上是分类的,它在df3中作为float的表示肯定是一些奇怪的行为,从pd.DataFramedf3.to_dict到d3.dtypes在将其转储到dict之前将其显示为object。我已经用转换为int替换了你的方法,它确实起作用了。Explic不过,在两个数据帧中将var5转换为对象类型并在之后将它们连接起来并不起作用。如果var5是float或int,您似乎可以连接,但不是一个对象-知道为什么吗?@jakes,因为您使用了我的方法,请随意接受我的回答:。int和float数据类型不会连接,也不会将其中一个连接到对象。当您连接df1和d时因为df1是一个对象,所以列变为object。您可以始终将int/float转换为对象,但不一定相反。所以Pandas将把这些转换为concat上的对象。如果您将concat float和int转换为float列,您将得到一个float列。对于您的问题,您永远无法将文本连接到数值。我明白了,bu
我还在思考为什么下面的方法不起作用:pd.concat[df1,df2],axis=0.astype{'var5':'object'}.mergedf3.astype{'var5':'object'},how='left',on=['var1','var2','var5']@jakes我明白你现在说的话了。当浮点数转换为对象时,小数点仍然保留。因此,如果要转换为对象然后加入,字符串“19188.0”不等于字符串“19188”。astype{var5':'int64'}.astype{'var5':'object'}
var1 var2 var3 var4 var5 var6 var7
0 2210 1 0 0 121160.0 NaN NaN
1 2210 2 0 0 20066.0 NaN NaN
2 2210 1 0 0 58621.0 NaN NaN
3 2210 2 0 0 201084.0 NaN NaN
4 2210 1 0 0 100180.0 NaN NaN
5 2210 2 0 0 74230.0 NaN NaN
6 2210 1 0 0 27789.0 NaN NaN
7 2210 2 0 0 66975.0 NaN NaN
8 2210 1 0 0 57410.0 NaN NaN
9 2210 2 0 0 49413.0 NaN NaN
10 2210 1 0 0 57112.0 NaN NaN
11 2210 2 0 0 19188.0 19.0 8.0
12 2210 1 0 0 61366.0 95.0 40.0
13 2210 2 0 0 27341.0 75.0 0.0
14 2210 1 0 0 59859.0 56.0 42.0
15 2210 2 0 0 173954.0 NaN NaN
16 2210 1 0 0 205651.0 22.0 52.0
17 2210 2 0 0 54861.0 NaN NaN
18 2210 1 0 0 165809.0 80.0 50.0
19 2210 2 0 0 60252.0 NaN NaN
20 2210 1 0 0 182156.0 NaN NaN
21 2210 2 0 0 82403.0 NaN NaN
22 2210 2 19 8 58786.0 NaN NaN
23 2210 1 22 52 37572.0 NaN NaN
24 2210 1 56 42 103955.0 NaN NaN
25 2210 2 75 0 19197.0 95.0 17.0
26 2210 1 80 50 14664.0 NaN NaN
27 2210 2 95 17 75773.0 NaN NaN