Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按索引和列合并两个数据帧_Python_Pandas_Merge - Fatal编程技术网

Python 按索引和列合并两个数据帧

Python 按索引和列合并两个数据帧,python,pandas,merge,Python,Pandas,Merge,我有两个数据帧,如下所示: df1 = pd.DataFrame() df1['v1'] = [5,7,2,4,9,7,2] df1['v2'] = ["a1", 'nan', "a2", "a3", "a5", "a6", "a9"] v1 v2 0 5 a1 1 7 nan 2 2 a2 3 4 a3 4 9 a5 5 7 a6 6 2 a9 v1 v2 pc1 pc2 0 5 a1

我有两个数据帧,如下所示:

df1 = pd.DataFrame()
df1['v1'] = [5,7,2,4,9,7,2]
df1['v2'] = ["a1", 'nan', "a2", "a3", "a5", "a6", "a9"]

   v1   v2
0   5   a1
1   7  nan
2   2   a2
3   4   a3
4   9   a5
5   7   a6
6   2   a9
  v1    v2       pc1       pc2
0  5   a1   0.048725  0.050773
1  7  nan        nan       nan
2  2   a2   0.289110  0.302272
3  4   a3   0.720966  0.663910
4  9   a5        nan       nan
5  7   a6   0.021616  0.308114
6  2   a9   0.205923  0.583591

df1有一列v2,其中包含与df2索引匹配的字符值。但它也有nan,并且可能包含df2中不存在对应行名的字符

现在我想将这些数据帧合并为一个,如下所示:

df1 = pd.DataFrame()
df1['v1'] = [5,7,2,4,9,7,2]
df1['v2'] = ["a1", 'nan', "a2", "a3", "a5", "a6", "a9"]

   v1   v2
0   5   a1
1   7  nan
2   2   a2
3   4   a3
4   9   a5
5   7   a6
6   2   a9
  v1    v2       pc1       pc2
0  5   a1   0.048725  0.050773
1  7  nan        nan       nan
2  2   a2   0.289110  0.302272
3  4   a3   0.720966  0.663910
4  9   a5        nan       nan
5  7   a6   0.021616  0.308114
6  2   a9   0.205923  0.583591
R
中,使用
rownames\u to\u列(df2,“v2”)
左联合(df1,)
函数


但是我怎样才能在熊猫身上做到呢?

更新:

In [37]: df1.merge(df2, right_index=True, left_on='v2', how='outer')
Out[37]:
   v1   v2       pc1       pc2
0   5   a1  0.252062  0.602530
1   7  nan       NaN       NaN
2   2   a2  0.328666  0.988321
3   4   a3  0.704342  0.809817
4   9   a5       NaN       NaN
5   7   a6  0.001230  0.602590
6   2   a9  0.635444  0.926872


你可以这样做

pd.merge(df1, df2, left_on = 'v2', right_index=True, how = 'left')
这将产生:

   v1   v2       pc1       pc2
0   5   a1  0.048725  0.050773
1   7  NaN       NaN       NaN
2   2   a2   0.28911  0.302272
3   4   a3  0.720966   0.66391
4   9   a5       NaN       NaN
5   7   a6  0.021616  0.308114
6   2   a9  0.205923  0.583591

谢谢,为什么现在索引没有排序,我如何才能按索引重新排序行?@spore234,在这种情况下,您希望将
df1
放在第一位-请参阅更新的答案…顺便说一句,
set\u index
调用会使事情复杂化。这是不需要的,您可以分别为每个帧传递连接参数。因此,如果使用例如
reset\u index()
pd.merge(df2.reset_index(), df1, left_on='idx', right_on='v2', how='outer').drop('idx', axis=1)


Output:

            pc1            pc2      v1    v2
  0      0.760966       0.059443    5     a1
  1      0.059443       0.984703    2     a2
  2      0.214868       0.677140    4     a3
  3      0.224410       0.037784    7     a6
  4      0.297342       0.341810    2     a9
  5        NaN            NaN       7     nan
  6        NaN            NaN       9     a5