Python 基于索引和列合并/连接两个数据帧_Python_Pandas_Join_Merge

Python 基于索引和列合并/连接两个数据帧

python pandas join merge

Python 基于索引和列合并/连接两个数据帧,python,pandas,join,merge,Python,Pandas,Join,Merge,我想加入（或合并？）两个数据帧。它们看起来如下所示： # Add GDP data for index, row in df.iterrows(): gdp_year = str(df.iloc[index].year) gdp_country = str(df.iloc[index].country) try: df.at[index, 'GDP'] = df_gdp.loc[gdp_country][gdp_year] except

我想加入（或合并？）两个数据帧。它们看起来如下所示：

# Add GDP data 
for index, row in df.iterrows():
    gdp_year = str(df.iloc[index].year)
    gdp_country = str(df.iloc[index].country)
    
    try:
        df.at[index, 'GDP'] = df_gdp.loc[gdp_country][gdp_year]
    except:
        df.at[index, 'GDP'] = 0
df

表1（=df）

表2（=df_gdp）

结果应在表1中加上一列“GDP”。应使用表1.year和表1.country的值查找表2中的值。因此，结果将是：

index  |   year  |  country  | GDP 
--------------------------------------
0      |   1970  | NL        | 5
1      |   1970  | UK        | 1
2      |   1980  | US        | 2
3      |   1990  | NL        | 0
4      |   1990  | US        | 0

我已经用

.iterrows（）

编写了该函数，但正如预期的那样，它的性能并不好。相反，我想知道是否也可以通过

.join（）

或

.merge（）

实现结果。我不明白的是如何根据索引（cntry）和变化列（年份）合并/加入。

.iterrows（）

的代码如下所示：

# Add GDP data 
for index, row in df.iterrows():
    gdp_year = str(df.iloc[index].year)
    gdp_country = str(df.iloc[index].country)
    
    try:
        df.at[index, 'GDP'] = df_gdp.loc[gdp_country][gdp_year]
    except:
        df.at[index, 'GDP'] = 0
df

您可以创建一个以dataframe为参数的函数，并将其应用于df：

def f(x):
    return df_gdp.loc[x['country'],x['year']]

df['GDP']=df.apply(f, axis=1)

结果:

   year country  GDP
0  1970      NL    5
1  1970      UK    1
2  1980      US    2
3  1990      NL    0
4  1990      US    0

您还可以按如下方式使用

说明：将索引设置为

country

列，然后执行该操作，以便可以获取行中的列。然后需要获得类似于df1的数据帧结构。之后，您需要使用指定列名，以便以后在执行合并操作时可以对其进行映射

import pandas as pd

data1 = {
    'year':['1970','1970','1980','1990','1990'],
    'country':['NL','UK','US','NL','US']
}

data2 = {
    'country':['NL','UK','US'],
    '1970':[5,1,9],
    '1980':[3,7,2],
    '1990':[0,1,0]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# See below code
df_m = df2.set_index('country').stack().reset_index().set_axis(['country', 'year', 'GDP'], axis=1)
pd.merge(df1, df_m, on=['year','country'])

最后只需对

df1

执行合并操作，并对

df_m

dataframe进行预处理（默认为内部连接操作）

您可以使用

melt

和

merge

：

df2.rename({'cntry': 'country'}, axis=1)\
.melt('country', var_name='year', value_name='GDP')\
.merge(df1, on=['country', 'year'])

输出：

  country  year  GDP
0      NL  1970    5
1      UK  1970    1
2      US  1980    2
3      NL  1990    0
4      US  1990    0

  country  year  GDP
0      NL  1970    5
1      UK  1970    1
2      US  1980    2
3      NL  1990    0
4      US  1990    0