Pandas Python:如何合并两个值不唯一的数据帧

Pandas Python:如何合并两个值不唯一的数据帧,pandas,merge,Pandas,Merge,我有两个数据帧 import pandas as pd a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } ) b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014

我有两个数据帧

import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } )
b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"] } )
我需要做的是获取每个日期端口对,比如说端口0和日期2014-02-25,在
b
中查找
fac
值,并将其填入
a
中的新列中。因此,输出应该如下所示

port cd date         fac 
1    1  "2014-02-26" 2
1    2  "2014-02-25" 1
... (so on) ...

我尝试合并日期和端口上的帧,但出现了一个错误,我认为这是由于数据帧的大小不同造成的,我不认为它会起作用。

如果您希望合并这两个数据帧,您应该使用

输出:

  port  cd  date       fac
0   1   1   2014-02-26  2
1   1   2   2014-02-26  2
2   1   2   2014-02-25  1
3   0   3   2014-02-26  2
4   0   1   2014-02-25  2
我认为需要:

但如果需要所有重复对的组合:

cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
   port  cd        date  fac
0     1   1  2014-02-26    2
1     1   2  2014-02-26    2
2     1   2  2014-02-25    1
3     0   3  2014-02-26    2
4     0   1  2014-02-25    2

我建议您在dataframe a中创建一个新列,并通过“numpy.vectorize”填充它

在数据框B中设置索引,以便按“日期”和“端口”访问:

然后,创建将应用于数据帧A中每一行的函数

这是输出:

   cd        date  port  fac
0   1  2014-02-26     1    2
1   2  2014-02-25     1    1
2   3  2014-02-26     0    2
3   2  2014-02-26     1    2
4   1  2014-02-25     0    2
cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
   port  cd        date  fac
0     1   1  2014-02-26    2
1     1   2  2014-02-26    2
2     1   2  2014-02-25    1
3     0   3  2014-02-26    2
4     0   1  2014-02-25    2
import pandas as pd
import numpy as np

A = pd.DataFrame({'port': [1, 1, 0, 1, 0], 'cd': [1, 2, 3, 2, 1], 'date': ["2014-02-26", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-25"]})
B = pd.DataFrame({'port': [0, 1, 0, 1, 0], 'fac': [2, 1, 2, 2, 3], 'date': ["2014-02-25", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-27"]})
C = B.set_index(['date', 'port'])
def get_fac(date, port):
    try:
        return C.loc[date].loc[port]['fac']
    except KeyError:
        return ''

A['fac'] = np.vectorize(get_fac)(A['date'], A['port'])
   cd        date  port  fac
0   1  2014-02-26     1    2
1   2  2014-02-25     1    1
2   3  2014-02-26     0    2
3   2  2014-02-26     1    2
4   1  2014-02-25     0    2