Python 熊猫：对于已排序序列B的所有元素，查找已排序序列A中最近元素的索引_Python_Pandas_Algorithm

Python 熊猫：对于已排序序列B的所有元素，查找已排序序列A中最近元素的索引

python pandas algorithm

Python 熊猫：对于已排序序列B的所有元素，查找已排序序列A中最近元素的索引,python,pandas,algorithm,Python,Pandas,Algorithm,我有一个数据帧，有两列排序的整数 A B 0 17 15 1 18 18 2 19 20 3 20 21 4 22 21 5 23 27 对于B的所有元素，我想找到A的最接近匹配元素的索引： A B closest_match_idx 0 17 15 0 1 18

我有一个数据帧，有两列排序的整数

      A        B
0     17       15
1     18       18
2     19       20
3     20       21
4     22       21
5     23       27

对于B的所有元素，我想找到A的最接近匹配元素的索引：

      A        B       closest_match_idx
0     17       15      0
1     18       18      1
2     19       20      3
3     20       21      3
4     22       21      3
5     23       27      5

我知道我能行

df['closest_match_idx'] = df.B.map(lambda x: (df.A - x).abs().idxmin()))

但对于一个显然是O（N）的问题，这是一个O（N**2）解决方案。除了滚动我自己的索引查找函数之外，我找不到任何更好的解决方案，但这感觉像是一个有现有解决方案的问题。想法

对于上下文，我最终要做的是为B的每个元素在A中找到最接近的匹配元素，最大的绝对差值（否则只使用B中的值）：

我还应该注意，在我的示例中，A和B的长度相同，但我希望有一个解决方案可以推广到不同长度的序列。

您可以使用。这需要对帧进行排序。这有利于它支持<代码>公差<代码>参数，允许您在其中考虑匹配的卡尺。

我将在附加的

'A_match'

列中留下，但如果不需要，您可以将其删除

res = pd.merge_asof(df.sort_values('B'), 
                    df.rename_axis(index='closest_idx').reset_index().drop(columns='B').sort_values('A'),
                    left_on='B', right_on='A',
                    direction='nearest',
                    suffixes=['', '_match'])

将公差设置为|距离|

limit

参数实际上是

公差

我相信这是完美的，谢谢！

      A        B       closest_match_idx   match_diff    output
0     17       15      0                   2             17
1     18       18      1                   0             18
2     19       20      3                   1             20
3     20       21      3                   1             20
4     22       21      3                   1             20
5     23       27      5                   4             23

res = pd.merge_asof(df.sort_values('B'), 
                    df.rename_axis(index='closest_idx').reset_index().drop(columns='B').sort_values('A'),
                    left_on='B', right_on='A',
                    direction='nearest',
                    suffixes=['', '_match'])

print(res)

    A   B  closest_idx  A_match
0  17  15            0       17
1  18  18            1       18
2  19  20            3       20
3  20  21            3       20
4  22  21            3       20
5  23  27            5       23

res = pd.merge_asof(df.sort_values('B'), 
                    df.rename_axis(index='closest_idx').reset_index().drop(columns='B').sort_values('A'),
                    left_on='B', right_on='A',
                    direction='nearest',
                    suffixes=['', '_match'],
                    tolerance=1)

#    A   B  closest_idx  A_match
#0  17  15          NaN      NaN
#1  18  18          1.0     18.0
#2  19  20          3.0     20.0
#3  20  21          3.0     20.0
#4  22  21          3.0     20.0
#5  23  27          NaN      NaN