Python 如何使用pandas将共享值上的这两个数据帧连接到一列中？_Python_Python 2.7_Pandas

Python 如何使用pandas将共享值上的这两个数据帧连接到一列中？

python python-2.7 pandas

Python 如何使用pandas将共享值上的这两个数据帧连接到一列中？,python,python-2.7,pandas,Python,Python 2.7,Pandas,我有如下数据帧（df1）- 另一个类似于以下（df2）- 我想要的是df2只在匹配的年份上合并/加入/concat（无论哪一个有效）预期的输出应该如下所示- read_year read_month load trading_block lmp 0 2017 3 0.019582 0 32.1201 1 2017 3 0.019460 0

我有如下数据帧（

df1

）-

另一个类似于以下（

df2

）-

我想要的是

df2

只在匹配的年份上合并/加入/concat（无论哪一个有效）

预期的输出应该如下所示-

   read_year  read_month      load  trading_block       lmp
0       2017           3  0.019582              0   32.1201
1       2017           3  0.019460              0   12.1230
2       2017           3  0.018888              0   40.2941
3       2017           3  0.018940              0   20.3918
4       2017           3  0.019114              0   50.9371

如何轻松地完成此操作？

我认为需要，但需要辅助列以按重复计数，还需要按子集指定列：

#changed years for match data
print (df2)
   read_year  read_month      lmp  trading_block
0       2009           1  37.5694              0
1       2009           1  34.5777              0
2       2017           1  33.7039              0
3       2017           1  33.1503              0
4       2017           1  33.8935              0

df1['g'] = df1.groupby('read_year').cumcount()
df2['g'] = df2.groupby('read_year').cumcount()

#need columns for join in subset + columns for add - here lmp column
df = df1.merge(df2[['read_year','g','lmp']],on=['read_year', 'g']).drop('g', axis=1)
print (df)
   read_year  read_month      load  trading_block      lmp
0       2017           3  0.019582              0  33.7039
1       2017           3  0.019460              0  33.1503
2       2017           3  0.018888              0  33.8935

你能添加预期输出吗？

df1.merge（df2，on='read_year'）

我添加了预期输出@我试过了，它添加了更多列的后缀，并导致

load

值重复……这些输入是如何产生输出的？三年是不匹配的。哦，数据帧实际上有800000行长，所以我只是做了

df.head（）

来显示一部分数据，只是为了让大家理解我的观点。它似乎起了作用，但我仍然得到后缀列-

df.columns

索引([u'read\u year'，u'read\u month\u x'，u'load'，u'trading\u block\u x'，u'read\u month\u y'，u'lmp'，u'trading\u block\u y'，dtype='object'）

   read_year  read_month      load  trading_block       lmp
0       2017           3  0.019582              0   32.1201
1       2017           3  0.019460              0   12.1230
2       2017           3  0.018888              0   40.2941
3       2017           3  0.018940              0   20.3918
4       2017           3  0.019114              0   50.9371

#changed years for match data
print (df2)
   read_year  read_month      lmp  trading_block
0       2009           1  37.5694              0
1       2009           1  34.5777              0
2       2017           1  33.7039              0
3       2017           1  33.1503              0
4       2017           1  33.8935              0

df1['g'] = df1.groupby('read_year').cumcount()
df2['g'] = df2.groupby('read_year').cumcount()

#need columns for join in subset + columns for add - here lmp column
df = df1.merge(df2[['read_year','g','lmp']],on=['read_year', 'g']).drop('g', axis=1)
print (df)
   read_year  read_month      load  trading_block      lmp
0       2017           3  0.019582              0  33.7039
1       2017           3  0.019460              0  33.1503
2       2017           3  0.018888              0  33.8935