在Python中只合并某些列_Python_Pandas_Merge

在Python中只合并某些列

python pandas merge

在Python中只合并某些列,python,pandas,merge,Python,Pandas,Merge,我有两个数据帧要合并。主要数据框架是人口 Pop: Country Name Country Code Year Population CountryYear 0 Aruba ABW 1960 54208.0 ABW-1960 1 Andorra AND 1960 13414.0 AND-1960 我有一个类似的国家GDP表

我有两个数据帧要合并。主要数据框架是人口

Pop:
        Country Name    Country Code    Year    Population  CountryYear
    0   Aruba           ABW             1960    54208.0     ABW-1960
    1   Andorra         AND             1960    13414.0     AND-1960

我有一个类似的国家GDP表

国内生产总值：

我想要的是有一个新的框架，组合起来，有字段：

Country Name
Country Code
Year    
Population  
CountryYear

从人口表和基于CountryYear的表中各自的GDP中删除，并将其作为唯一添加到其中的列

我尝试了这个，但得到了重复的表：

df_merged = pd.merge(poptransposed, gdptransposed, left_on=['CountryYear'],
              right_on=['CountryYear'],
              how='inner')
df_merged.head()


  Country Name_x    Country Code_x  Year_x  Population  CountryYear Country Name_y  Country Code_y  Year_y  GDP
Aruba   ABW 1960    54208.0 ABW-1960    Aruba   ABW 1960    0.000000e+00
Andorra AND 1960    13414.0 AND-1960    Andorra AND 1960    0.000000e+00

解决方案是使用国家代码作为索引，然后使用concat函数（）：

只需从结果中选择所有需要的列：

df_merged[[Country Name_x'、'Country code_x'、'Year_x'、'Population'、…]

。尝试以下操作：

df_merged=pd.merge（popsTransposed、gdptTransposed[['CountryYear'、'GDP']]，on='CountryYear'）

merge自动合并常用列名。应该是poptransposed.merge（gdptransposed）。如果我错了，告诉我。我正在打电话，无法验证。@MaxU这是否有效！非常感谢。

df_merged = pd.merge(poptransposed, gdptransposed, left_on=['CountryYear'],
              right_on=['CountryYear'],
              how='inner')
df_merged.head()


  Country Name_x    Country Code_x  Year_x  Population  CountryYear Country Name_y  Country Code_y  Year_y  GDP
Aruba   ABW 1960    54208.0 ABW-1960    Aruba   ABW 1960    0.000000e+00
Andorra AND 1960    13414.0 AND-1960    Andorra AND 1960    0.000000e+00

Pop = Pop.set_index('Country Code', drop = True)
GDP = GDP.set_index('Country Code', drop = True)

df_merged= pd.concat([Pop, GDP['GDP'].to_frame('GDP')], axis = 1, join = 'inner').reset_index(drop = False)