在Python中合并两列上大小不同的数据帧
我希望将python中的两个数据帧(df1和df2)合并到两列(Site和Building)上,并使用不同的行数量,以实现df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我在下面的示例中创建了数据框架(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并会因为数据类型而出现问题在Python中合并两列上大小不同的数据帧,python,pandas,dataframe,merge,multiple-columns,Python,Pandas,Dataframe,Merge,Multiple Columns,我希望将python中的两个数据帧(df1和df2)合并到两列(Site和Building)上,并使用不同的行数量,以实现df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我在下面的示例中创建了数据框架(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并会因为数据类型而出现问题 import pandas as pd df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium'
import pandas as pd
df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4', 'X4','X5','X5','X5','X5','X5','X5','X1','X1', 'X1','X1', 'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1', 'X1','X1', 'X1'],
'Generator' : ['DE','NDE', 'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE', 'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4', 'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE', 'NDE','GBX1','GBX2','GBX3','GBX4', 'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}
df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])
df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
'Safe' : [1, 1, 1, 1, 1, 0, 1, 1, 0]}
df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])
df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)
我还尝试根据将每个的数据类型更改为字符串
以及下面提到的检查编码的步骤,但似乎都匹配ie。内容中没有可见的依赖项
df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')
数据类型:
df2.datatypes:
Site object
Building object
Safe object
dtype: object
df1.datatypes:
Building object
Site object
Generator object
dtype: object
我尝试了以下代码:
df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
或:
但结果只得到左侧ie结果1中的数据
我尝试了如下所示的外部联接,结果为2:
df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')
为我对熊猫的相对无知道歉。我注意到您共享的代码中有一个小错误
df1=pd.DataFrame(df1,列=['Site','Building','Generator'])
应该是df1=pd.DataFrame(df,列=['Site','Building','Generator'])
。应该传递给pd.Dataframe的变量应该是df
,而不是df1
在此步骤之后,只需使用数据帧进行合并即可获得所需的结果
pd.merge(df1,df2, on=['Building','Site'])
输出如下所示:
Site Building Generator Safe
0 Belgium X1 DE 1
1 Belgium X1 NDE 1
2 Belgium X1 GBX1 1
3 Belgium X1 GBX2 1
4 Belgium X1 GBX3 1
5 Belgium X1 GBX4 1
6 Belgium X2 DE 1
7 Belgium X2 NDE 1
8 Belgium X2 GBX1 1
9 Belgium X2 GBX2 1
10 Belgium X2 GBX3 1
11 Belgium X2 GBX4 1
12 Belgium X3 DE 1
请花点时间阅读这篇关于Hi Roshan的文章,谢谢你的回复。此后,我用语法运行了演示代码,效果良好。数据类型已添加到上述示例中(都是对象,即指向另一个源),但我可以确认原始数据来自SQL查询,并且数据帧已被透视和索引重置等。我已尝试使用问题中提到的代码将两个数据帧的所有适用列设置为字符串,但它似乎不起作用。
pd.merge(df1,df2, on=['Building','Site'])
Site Building Generator Safe
0 Belgium X1 DE 1
1 Belgium X1 NDE 1
2 Belgium X1 GBX1 1
3 Belgium X1 GBX2 1
4 Belgium X1 GBX3 1
5 Belgium X1 GBX4 1
6 Belgium X2 DE 1
7 Belgium X2 NDE 1
8 Belgium X2 GBX1 1
9 Belgium X2 GBX2 1
10 Belgium X2 GBX3 1
11 Belgium X2 GBX4 1
12 Belgium X3 DE 1