在Python中合并两列上大小不同的数据帧

在Python中合并两列上大小不同的数据帧,python,pandas,dataframe,merge,multiple-columns,Python,Pandas,Dataframe,Merge,Multiple Columns,我希望将python中的两个数据帧(df1和df2)合并到两列(Site和Building)上,并使用不同的行数量,以实现df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我在下面的示例中创建了数据框架(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并会因为数据类型而出现问题 import pandas as pd df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium'

我希望将python中的两个数据帧(df1和df2)合并到两列(Site和Building)上,并使用不同的行数量,以实现df1中每个生成器值的“安全”值。下面是一个演示代码,虽然我在下面的示例中创建了数据框架(看起来很有效),但实际问题中每个表的数据都来自SQL查询,这使我相信合并会因为数据类型而出现问题

import pandas as pd

df = {'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Holland','Greece','Greece','Greece','Greece','Greece','Greece'],
        'Building' : ['X1','X1','X1','X1','X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X4','X4','X4','X4','X4',   'X4','X5','X5','X5','X5','X5','X5','X1','X1',   'X1','X1',  'X1','X1','X2','X2','X2','X2','X2','X2','X3','X3','X3','X3','X3','X3','X1', 'X1','X1',  'X1','X1',  'X1'],
        'Generator' : ['DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE',  'GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE',    'NDE','GBX1','GBX2','GBX3','GBX4',  'DE','NDE','GBX1','GBX2','GBX3','GBX4','DE','NDE','GBX1','GBX2','GBX3','GBX4']}

df1 = pd.DataFrame(df1, columns = ['Site', 'Building', 'Generator'])


df15 = {'Building' : ['X1','X2','X3','X4','X5','X1','X2','X3','X1'],
        'Site': ['Belgium','Belgium','Belgium','Belgium','Belgium','Holland','Holland','Holland','Greece'],
        'Safe' : [1,    1,  1,  1,  1,  0,  1,  1,  0]}


df2 = pd.DataFrame(df15, columns = ['Site', 'Building', 'Safe'])


df3 = df1.merge(df2, how = 'left', on = ['Site', 'Building'], indicator = True)

我还尝试根据将每个的数据类型更改为字符串

以及下面提到的检查编码的步骤,但似乎都匹配ie。内容中没有可见的依赖项

df1['Building'] = df1['Building'].str.encode('UTF-8')
df1['Site'] = df1['Site'].str.encode('UTF-8')
数据类型:

df2.datatypes:

    Site                           object
    Building                       object
    Safe                           object
    dtype: object


    df1.datatypes:


    Building      object
    Site          object
    Generator     object
    dtype:        object
我尝试了以下代码:

df3 = df1.merge(df2, left_on = ['Site', 'Building'], right_on = ['Site', 'Building'], how = 'left', indicator = 'indicator')
或:

但结果只得到左侧ie结果1中的数据

我尝试了如下所示的外部联接,结果为2:

df3 = df1.merge(df2, on = ['Site', 'Building'], how = 'outer', indicator = 'indicator')


为我对熊猫的相对无知道歉。

我注意到您共享的代码中有一个小错误

df1=pd.DataFrame(df1,列=['Site','Building','Generator'])
应该是
df1=pd.DataFrame(df,列=['Site','Building','Generator'])
。应该传递给pd.Dataframe的变量应该是
df
,而不是
df1

在此步骤之后,只需使用数据帧进行合并即可获得所需的结果

pd.merge(df1,df2, on=['Building','Site'])
输出如下所示:

       Site Building Generator  Safe
0   Belgium       X1        DE     1
1   Belgium       X1       NDE     1
2   Belgium       X1      GBX1     1
3   Belgium       X1      GBX2     1
4   Belgium       X1      GBX3     1
5   Belgium       X1      GBX4     1
6   Belgium       X2        DE     1
7   Belgium       X2       NDE     1
8   Belgium       X2      GBX1     1
9   Belgium       X2      GBX2     1
10  Belgium       X2      GBX3     1
11  Belgium       X2      GBX4     1
12  Belgium       X3        DE     1


请花点时间阅读这篇关于Hi Roshan的文章,谢谢你的回复。此后,我用语法运行了演示代码,效果良好。数据类型已添加到上述示例中(都是对象,即指向另一个源),但我可以确认原始数据来自SQL查询,并且数据帧已被透视和索引重置等。我已尝试使用问题中提到的代码将两个数据帧的所有适用列设置为字符串,但它似乎不起作用。
pd.merge(df1,df2, on=['Building','Site'])
       Site Building Generator  Safe
0   Belgium       X1        DE     1
1   Belgium       X1       NDE     1
2   Belgium       X1      GBX1     1
3   Belgium       X1      GBX2     1
4   Belgium       X1      GBX3     1
5   Belgium       X1      GBX4     1
6   Belgium       X2        DE     1
7   Belgium       X2       NDE     1
8   Belgium       X2      GBX1     1
9   Belgium       X2      GBX2     1
10  Belgium       X2      GBX3     1
11  Belgium       X2      GBX4     1
12  Belgium       X3        DE     1