Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas Merge(pd.Merge)如何设置索引和连接_Python_Pandas - Fatal编程技术网

Python Pandas Merge(pd.Merge)如何设置索引和连接

Python Pandas Merge(pd.Merge)如何设置索引和连接,python,pandas,Python,Pandas,我有两个数据帧:dfLeft和dfRight,以日期作为索引 左: cusip factorL date 2012-01-03 XXXX 4.5 2012-01-03 YYYY 6.2 .... 2012-01-04 XXXX 4.7 2012-01-04 YYYY 6.1 .... 右: idc__id factorR date 2012-01-03 XX

我有两个数据帧:dfLeft和dfRight,以日期作为索引

左:

            cusip    factorL
date  
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
....
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
....
右:

            idc__id    factorR
date  
2012-01-03    XXXX      5.0
2012-01-03    YYYY      6.0
....
2012-01-04    XXXX      5.1
2012-01-04    YYYY      6.2
两者的形状接近
(121900,3)

我尝试了以下合并:

test = pd.merge(dfLeft, dfRight, left_index=True, right_index=True, left_on='cusip', right_on='idc__id', how = 'inner')
这使得测试的形状为
(60643500,6)

对这里的问题有什么建议吗?我希望它能够根据日期和cusip/idc_id进行合并。注意:在本例中,cusip是对齐的,但实际上可能不是这样

谢谢

预期产量 测试:


重置索引,然后在多个(列)键上合并:

然后可以将“日期”重置为索引:

dfMerged.set_index('date', inplace=True)
这里有一个例子:

raw1 = '''
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
'''

raw2 = '''
2012-01-03    XYXX      45.
2012-01-03    YYYY      62.
2012-01-04    XXXX      -47.
2012-01-05    YYYY      61.
'''

import pandas as pd
from StringIO import StringIO


df1 = pd.read_table(StringIO(raw1), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)
df2 = pd.read_table(StringIO(raw2), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)

df1.columns = ['date', 'cusip', 'factorL']
df2.columns = ['date', 'idc__id', 'factorL']

print pd.merge(df1, df2,
         left_on=['date', 'cusip'],
         right_on=['date', 'idc__id'],
         how='inner')

                  date cusip  factorL_x idc__id  factorL_y
0  2012-01-03 00:00:00  YYYY        6.2    YYYY         62
1  2012-01-04 00:00:00  XXXX        4.7    XXXX        -47

您可以将
'cuspin'
'idc\u id'
作为索引添加到您面前的数据帧中(以下是它在前几行上的工作方式):


这也很好,因为您不必创建数据帧的副本。
raw1 = '''
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
'''

raw2 = '''
2012-01-03    XYXX      45.
2012-01-03    YYYY      62.
2012-01-04    XXXX      -47.
2012-01-05    YYYY      61.
'''

import pandas as pd
from StringIO import StringIO


df1 = pd.read_table(StringIO(raw1), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)
df2 = pd.read_table(StringIO(raw2), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)

df1.columns = ['date', 'cusip', 'factorL']
df2.columns = ['date', 'idc__id', 'factorL']

print pd.merge(df1, df2,
         left_on=['date', 'cusip'],
         right_on=['date', 'idc__id'],
         how='inner')
                  date cusip  factorL_x idc__id  factorL_y
0  2012-01-03 00:00:00  YYYY        6.2    YYYY         62
1  2012-01-04 00:00:00  XXXX        4.7    XXXX        -47
In [10]: dfL
Out[10]: 
           cuspin  factorL
date                      
2012-01-03   XXXX      4.5
2012-01-03   YYYY      6.2

In [11]: dfL1 = dfLeft.set_index('cuspin', append=True)

In [12]: dfR1 = dfRight.set_index('idc_id', append=True)

In [13]: dfL1
Out[13]: 
                   factorL
date       cuspin         
2012-01-03 XXXX        4.5
           YYYY        6.2

In [14]: dfL1.join(dfR1)
Out[14]: 
                   factorL  factorR
date       cuspin                  
2012-01-03 XXXX        4.5        5
           YYYY        6.2        6