Pandas Python:如何合并两个值不唯一的数据帧
我有两个数据帧Pandas Python:如何合并两个值不唯一的数据帧,pandas,merge,Pandas,Merge,我有两个数据帧 import pandas as pd a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } ) b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014
import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } )
b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"] } )
我需要做的是获取每个日期端口对,比如说端口0和日期2014-02-25,在b
中查找fac
值,并将其填入a
中的新列中。因此,输出应该如下所示
port cd date fac
1 1 "2014-02-26" 2
1 2 "2014-02-25" 1
... (so on) ...
我尝试合并日期和端口上的帧,但出现了一个错误,我认为这是由于数据帧的大小不同造成的,我不认为它会起作用。如果您希望合并这两个数据帧,您应该使用 输出:
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-26 2
2 1 2 2014-02-25 1
3 0 3 2014-02-26 2
4 0 1 2014-02-25 2
我认为需要:
但如果需要所有重复对的组合:
cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-26 2
2 1 2 2014-02-25 1
3 0 3 2014-02-26 2
4 0 1 2014-02-25 2
我建议您在dataframe a中创建一个新列,并通过“numpy.vectorize”填充它 在数据框B中设置索引,以便按“日期”和“端口”访问: 然后,创建将应用于数据帧A中每一行的函数: 这是输出:
cd date port fac
0 1 2014-02-26 1 2
1 2 2014-02-25 1 1
2 3 2014-02-26 0 2
3 2 2014-02-26 1 2
4 1 2014-02-25 0 2
cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-26 2
2 1 2 2014-02-25 1
3 0 3 2014-02-26 2
4 0 1 2014-02-25 2
import pandas as pd
import numpy as np
A = pd.DataFrame({'port': [1, 1, 0, 1, 0], 'cd': [1, 2, 3, 2, 1], 'date': ["2014-02-26", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-25"]})
B = pd.DataFrame({'port': [0, 1, 0, 1, 0], 'fac': [2, 1, 2, 2, 3], 'date': ["2014-02-25", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-27"]})
C = B.set_index(['date', 'port'])
def get_fac(date, port):
try:
return C.loc[date].loc[port]['fac']
except KeyError:
return ''
A['fac'] = np.vectorize(get_fac)(A['date'], A['port'])
cd date port fac
0 1 2014-02-26 1 2
1 2 2014-02-25 1 1
2 3 2014-02-26 0 2
3 2 2014-02-26 1 2
4 1 2014-02-25 0 2