Python 通过查找多个列值进行合并_Python_Python 3.x_Pandas_Merge

Python 通过查找多个列值进行合并

python python-3.x pandas merge

Python 通过查找多个列值进行合并,python,python-3.x,pandas,merge,Python,Python 3.x,Pandas,Merge,我有以下两个表，以熊猫数据帧的形式出现这是所有可能组合的列表： Table A: 0 1 2 +---+----+----+ 0| A |None|None| +---+----+----+ 1| B |None|None| +---+----+----+ 2|...| | | +---+----+----+ 3| A | C | D | +---+----+----+ 4| B | C | D | +---+----+----+ 这些是变量

我有以下两个表，以熊猫数据帧的形式出现

这是所有可能组合的列表：

Table A:
   0    1    2
 +---+----+----+
0| A |None|None|
 +---+----+----+
1| B |None|None|
 +---+----+----+
2|...|    |    |
 +---+----+----+
3| A | C  | D  |
 +---+----+----+
4| B | C  | D  |
 +---+----+----+

这些是变量的关联值：

Table B:
  0   1
 +---+---+
0| A | 5 |
 +---+---+
1| B | 2 |
 +---+---+
2| C | 7 |
 +---+---+
3| D | 4 |
 +---+---+

我需要的是这样的东西：

   0    1    2   3
 +---+----+----+---+
0| A |None|None| 5 |
 +---+----+----+---+
1| B |None|None| 2 |
 +---+----+----+---+
2|...|    |    |   |
 +---+----+----+---+
3| A | C  | D  | 7 |
 +---+----+----+---+
4| B | C  | D  | 7 |
 +---+----+----+---+

其中，第3列是通过从

表A

中查找每个相关值，在

表B

中的

、

和

列

中返回这些值中的最大值来找到的

例如：第3行显示了

、

和

的组合。因此，列

在

表B

中查找

，值为

，然后在

表B

中查找

，值为

。最后，它在

表B

中查找

，值为

。在这3个数字中，7是最大的，因此它返回该值

到目前为止我已经尝试了

pandas.merge

，但没有成功

更新： 我试过这个：

Final=df1.insert(3,column='min space',value=df1.join(df2.set_index(0),on=0).max())

但是它只返回

None

，并且不考虑df1中的多个列，如果我尝试添加多个列<代码>[0,1,2]它告诉我每行需要相同数量的列

您可以将所有单元格转换为数字（通过关联的df），然后获得每行最大值的列

import pandas as pd
df = pd.DataFrame({                 # original df
    0:['A', 'B', 'A', 'B',],
    1:[None, None, 'C', 'C',],
    2:[None, None, 'D', 'D',],
    })
rdf = pd.DataFrame({                # associated values
    0:['A', 'B', 'C', 'D',],
    1:[5, 2, 7, 4,],
    })

tdf = df                            # copy the original df
rdf = rdf.set_index(0)[1]           # set index of rdf for next line
tdf = tdf[tdf.columns].replace(rdf) # replace all values in tdf by rdf
tdf[3] = tdf.max(axis=1)            # column[3] = max of each row
df[3] = tdf[3]                      # add column[3] to original df

试试这个

#!/usr/bin/env python3
import pandas as pd

A, B, C, D = 5, 2, 7, 4

df = pd.DataFrame({
    0: [A, B, None, A, B],
    1: [None, None, None, C, C],
    2: [None, None, None, D, D]
    })

df[3] = df.max(axis=1)

输出：

     0    1    2    3
0  5.0  NaN  NaN  5.0
1  2.0  NaN  NaN  2.0
2  NaN  NaN  NaN  NaN
3  5.0  7.0  4.0  7.0
4  2.0  7.0  4.0  7.0

尝试使用

replace

dfA['out'] = dfA.replace(dict(zip(dfB[0],dfB[1]))).max(1)
dfA
Out[487]: 
   0     1     2  out
0  A  None  None  5.0
1  B  None  None  2.0
2  A     C     D  7.0
3  B     C     D  7.0

例如：

>>> df1
     0    1    2
0    A  NaN  NaN
1    B  NaN  NaN
2  ...  NaN  NaN
3    A    C    D
4    B    C    D

>>> df2
   0  1
0  A  5
1  B  2
2  C  7
3  D  4

>>> df1.columns
Int64Index([0, 1, 2], dtype='int64')

>>> df2.columns
Int64Index([0, 1], dtype='int64')

>>> df2[1].dtype
dtype('int64')

df1[3] = df2.merge(
    df1.stack(dropna=False).reset_index(0), how='outer'
).groupby('level_0').max()

>>> df1
     0    1    2    3
0    A  NaN  NaN  5.0
1    B  NaN  NaN  2.0
2  ...  NaN  NaN  NaN
3    A    C    D  7.0
4    B    C    D  7.0

这是一个错误<代码>合并错误：没有要执行合并的公共列。合并选项：left_on=None，right_on=None，left_index=False，right_index=False我添加了一个示例，演示了它的工作原理。根据您的示例-它应该自动合并

列。我需要将字母ID保留在表中。

df1[3] = df2.merge(
    df1.stack(dropna=False).reset_index(0), how='outer'
).groupby('level_0').max()

>>> df1
     0    1    2
0    A  NaN  NaN
1    B  NaN  NaN
2  ...  NaN  NaN
3    A    C    D
4    B    C    D

>>> df2
   0  1
0  A  5
1  B  2
2  C  7
3  D  4

>>> df1.columns
Int64Index([0, 1, 2], dtype='int64')

>>> df2.columns
Int64Index([0, 1], dtype='int64')

>>> df2[1].dtype
dtype('int64')

df1[3] = df2.merge(
    df1.stack(dropna=False).reset_index(0), how='outer'
).groupby('level_0').max()

>>> df1
     0    1    2    3
0    A  NaN  NaN  5.0
1    B  NaN  NaN  2.0
2  ...  NaN  NaN  NaN
3    A    C    D  7.0
4    B    C    D  7.0