Python 按最近的索引连接数据帧值_Python_Pandas_Dataframe

Python 按最近的索引连接数据帧值

python pandas dataframe

Python 按最近的索引连接数据帧值,python,pandas,dataframe,Python,Pandas,Dataframe,是否有一种快速且良好的做法，通过最接近的索引连接值？我必须为大数据帧做这件事，我尝试过的黑客和变通方法都很慢，因此没有多大用处假设我有两个数据帧df和df2。现在我想把df2的值加入到df中，关于它的最近索引 import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0,100,size=(4, 6)), index=[1,1.55,3.33,9.88],

是否有一种快速且良好的做法，通过最接近的索引连接值？我必须为大数据帧做这件事，我尝试过的黑客和变通方法都很慢，因此没有多大用处

假设我有两个数据帧

df

和

df2

。现在我想把

df2

的值加入到

df

中，关于它的最近索引

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(4, 6)), 
                index=[1,1.55,3.33,9.88], 
                columns=[1,2.66,4.66,8.33,11.11,12])

df2 = pd.DataFrame(np.random.randint(0,100,size=(2, 3)), 
                index=[1.51,3.31], 
                columns=[2.64,4.65,8.31])

In [23]: df
Out[23]:

         1.00   2.66   4.66   8.33   11.11  12.00
1.00     98     40     28     36     49     92
1.55     52     51     61     64     28     98
3.33     66     33     91     21     24     79
9.88     30     21     13     62     89     22

In [24]: df2
Out[24]:

      2.64  4.65  11.12
1.51   999   999   999
3.31   999   999   999

# The result should look like the following:

         1.00   2.66   4.66   8.33   11.11  12.00
1.00     98     40     28     36     49     92
1.55     52     999    999    55     999    98
3.33     66     999    999    67     999    79
9.88     30     21     13     62     89     22

设置
因为OP数据帧不一致

df = pd.DataFrame(
    1,
    index=[1,1.55,3.33,9.88],
    columns=[1,2.66,4.66,8.33,11.11,12])

df2 = pd.DataFrame(
    999,
    index=[1.51,3.31],
    columns=[2.64,4.65,8.31])

print(df)

      1.00   2.66   4.66   8.33   11.11  12.00
1.00      1      1      1      1      1      1
1.55      1      1      1      1      1      1
3.33      1      1      1      1      1      1
9.88      1      1      1      1      1      1

print(df2)

      2.64  4.65  8.31
1.51   999   999   999
3.31   999   999   999

我没时间解释这些诡计

我宁愿这样做

df2.stack().reindex_like(df.stack(), **kw)

但我得到了：

NotImplementedError:尚未为多索引实现方法='nearest'；见GitHub第9365期

至少在将来的某个时候它会可用。