Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何首先将_组合起来,将特定列的两个数据帧索引为一个?_Python_Pandas_Dataframe_Path Combine - Fatal编程技术网

Python 如何首先将_组合起来,将特定列的两个数据帧索引为一个?

Python 如何首先将_组合起来,将特定列的两个数据帧索引为一个?,python,pandas,dataframe,path-combine,Python,Pandas,Dataframe,Path Combine,通过c列映射后, 如果A列有值,则插入A列的值;如果不是,则插入B列 data1 data2 a b c a c d a1 b1 c1 1a c1 1d b2 c2 2a c2 2d a3 c3

通过c列映射后, 如果A列有值,则插入A列的值;如果不是,则插入B列

data1                               data2

a    b    c                      a    c    d
a1   b1   c1                     1a   c1   1d   
     b2   c2                     2a   c2   2d
a3        c3                     3a   c3   3d
                                 4a   c4   4d
我想要的结果

  result
   a    b   c     
   a1   b1  c1
   2a   b2  c2
   a3       c3
我尝试了以下方法,但不满意

->>> result = data1.merge(data2, on=['c'])
Prefixes _x and _y are created. combine_first is not applied.

->>> result = data1.combine_first(data2)
It is not mapped by column c.
如何获得好的结果? 我请求你的帮助。
谢谢

我不是100%清楚您是如何索引数据帧的(
data1
data2
),但是如果您在
'c'
列上索引它们,应该可以

以下是我创建您的数据的方式:

import pandas as pd
data1 = pd.DataFrame({'a': ['a1', None, 'a3'],
                      'b': ['b1', 'b2', None],
                      'c': ['c1', 'c2', 'c3']})

data2 = pd.DataFrame({'a': ['1a', '2a', '3a', '4a'],
                      'c': ['c1', 'c2', 'c3', 'c4'],
                      'd': ['1d', '2d', '3d', '4d']})
然后我将两者的索引设置为列
'c'

data1 = data1.set_index('c')
data2 = data2.set_index('c')
然后我首先使用
combine\u
,就像您所做的那样:

data_combined = data1.combine_first(data_2)
我明白了:

    a   b   d
c           
c1  a1  b1  1d
c2  2a  b2  2d
c3  a3  None    3d
c4  4a  NaN 4d
不确定为什么不希望索引为
'c4'
的行或列为
'd'
,但删除它们很容易:

data_combined = data_combined.drop('d', axis=1)
data_combined = data_combined.loc[data_combined.index != 'c4']
然后我重新排序以获得您想要的结果:

data_combined = data_combined.reset_index()
data_combined = data_combined[['a', 'b', 'c']]
data_combined = data_combined.fillna('')


    a   b   c
0   a1  b1  c1
1   2a  b2  c2
2   a3      c3

您也可以这样尝试:

# set indexes
data1 = data1.set_index('c')
data2 = data2.set_index('c')

# join data on indexes
datax = data1.join(data2.drop('d', axis=1), rsuffix='_rr').reset_index()

# fill missing value in column a
datax['a'] = datax['a'].fillna(datax['a_rr'])

# drop unwanted columns
datax.drop('a_rr', axis=1, inplace=True)

# fill missing values with blank spaces
datax.fillna('', inplace=True)

# output
    a   b   c
0   a1  b1  c1
1   2a  b2  c2
2   a3      c3

使用@IdoS设置:

import pandas as pd
data1 = pd.DataFrame({'a': ['a1', None, 'a3'],
                      'b': ['b1', 'b2', None],
                      'c': ['c1', 'c2', 'c3']})

data2 = pd.DataFrame({'a': ['1a', '2a', '3a', '4a'],
                      'c': ['c1', 'c2', 'c3', 'c4'],
                      'd': ['1d', '2d', '3d', '4d']})
您可以使用
设置索引
先合并
,然后重新索引:

df_out = data1.set_index('c').combine_first(data2.set_index('c'))\
     .reindex(data1.c)\
     .reset_index()

df_out
输出:

    c   a     b   d
0  c1  a1    b1  1d
1  c2  2a    b2  2d
2  c3  a3  None  3d

非常感谢非常感谢非常感谢
    c   a     b   d
0  c1  a1    b1  1d
1  c2  2a    b2  2d
2  c3  a3  None  3d