Python 比较两个列表以找到相等或近似的匹配，而无需进行N^2次迭代_Python_Performance_List

Python 比较两个列表以找到相等或近似的匹配，而无需进行N^2次迭代

python performance list

Python 比较两个列表以找到相等或近似的匹配，而无需进行N^2次迭代,python,performance,list,Python,Performance,List,我有两份清单： list1 = [101, 110, 136] list2 = [101.04, 264.5, 379.9, 466.4, 629.6, 724.4, 799.8, 914.3] 迭代list1并将此列表中的每个元素与list2中的元素进行比较。如果在第二个列表中遇到与list1中的元素完全匹配或近似匹配的数字，则输出该匹配注意：我希望严格避免N^2迭代，因为我希望尽可能高效地进行它您想过近似的含义吗 >>> list1 = [101, 110, 136]

我有两份清单：

list1 = [101, 110, 136]
list2 = [101.04, 264.5, 379.9, 466.4, 629.6, 724.4, 799.8, 914.3]

迭代

list1

并将此列表中的每个元素与

list2

中的元素进行比较。如果在第二个列表中遇到与

list1

中的元素完全匹配或近似匹配的数字，则输出该匹配

注意：我希望严格避免N^2迭代，因为我希望尽可能高效地进行它

您想过近似的含义吗

>>> list1 = [101, 110, 136]
>>> list2 = [101.04, 264.5, 379.9, 466.4, 629.6, 724.4, 799.8, 914.3]
>>> set(int(x) for x in list1) & set(int(x) for x in list2)
set([101])

简单，但如果

list2

是

[100.96264.5379.9，…

您将无法获得匹配项

当您定义“近似”时，您可以开始正确地思考解决方案

如果对列表进行了预排序，那么提到这一点会很有帮助。事实证明，这个问题有点棘手。下面的代码应该适用于任何数据集和边距值，但我尚未对其进行广泛测试

避免O（N^2）性能的唯一方法是对数据进行排序，这允许使用两个索引值，这样您就可以以与第一个不同的速率遍历第二个列表，并且仍然可以进行有效的比较

下面的代码将为列表1中的每个项目打印列表2中的每个匹配项，因此打印可能会有一些重复项，因此性能将略低于O（n），但使用较小的页边距会更好。（此处选择较大的页边距以夸大将其设置为高或低的效果）

list1=[101110136380]
列表2=[101.04110.009264.5379.9466.469629.6724.4799.8914.3]
#确保列表已排序
列表1.sort（）
列表2.sort（）
#根据需要设置不同的边距
保证金=100
idx=0；
对于清单1中的i：
当i>list2[idx]而不是abs（i-list2[idx]）时，这应该会给出O（nlogn）时间（因为有两种排序），并带有用户指定的容差epsilon。它松散地基于mergesort的合并步骤：
#!/usr/local/cpython-3.3/bin/python

import pprint

def approximate_matches(list1, list2, epsilon = 0.5):
    len_list1 = len(list1)
    len_list2 = len(list2)

    list1_index = 0
    list2_index = 0

    while list1_index < len_list1 and list2_index < len_list2:
        list1_element = list1[list1_index]
        list2_element = list2[list2_index]

        difference = abs(list1_element - list2_element)

        if difference < epsilon:
            yield (list1_element, list2_element)
            list1_index += 1
            list2_index += 1
        elif list1_element < list2_element:
            list1_index += 1
        elif list2_element < list1_element:
            list2_index += 1
        else:
            raise AssertionError('Unexpected else taken')


def main():
    list1 = [101.0, 110.0, 136.0, 379.6, 800.0, 900.0]
    list2 = [101.04, 264.5, 379.9, 466.4, 629.6, 724.4, 799.8, 914.3]

    list1.sort()
    list2.sort()

    pprint.pprint(list(approximate_matches(list1, list2)))

main()

！/usr/local/cpython-3.3/bin/python
导入pprint
def近似_匹配（列表1，列表2，ε=0.5）：
len_list1=len（list1）
len_list2=len（list2）
列表1_索引=0
列表2_索引=0
而list1_索引

嗯
PS：请注意，如果列表1中的一个数字与列表2中的两个数字匹配（反之亦然），此代码将只报告一个匹配。
看起来您需要一些代码，但是，您可以显示您的尝试吗？请解释近似匹配情况吗？“为我完成此任务”问题往往不受欢迎。你能展示一下你的尝试吗？简单地以问题的形式重写你的问题，而不是像命令一样，也可能会提高你的机会。当你得到列表时，列表会被排序吗？也许使用集。交叉
方法会比&？@downvot更好呃，这个没有明确说明的问题似乎被放弃了，但你没有否决。为什么否决我的答案，因为我的答案对某些“近似”值是正确的？你能做得更好吗？
#!/usr/local/cpython-3.3/bin/python

import pprint

def approximate_matches(list1, list2, epsilon = 0.5):
    len_list1 = len(list1)
    len_list2 = len(list2)

    list1_index = 0
    list2_index = 0

    while list1_index < len_list1 and list2_index < len_list2:
        list1_element = list1[list1_index]
        list2_element = list2[list2_index]

        difference = abs(list1_element - list2_element)

        if difference < epsilon:
            yield (list1_element, list2_element)
            list1_index += 1
            list2_index += 1
        elif list1_element < list2_element:
            list1_index += 1
        elif list2_element < list1_element:
            list2_index += 1
        else:
            raise AssertionError('Unexpected else taken')


def main():
    list1 = [101.0, 110.0, 136.0, 379.6, 800.0, 900.0]
    list2 = [101.04, 264.5, 379.9, 466.4, 629.6, 724.4, 799.8, 914.3]

    list1.sort()
    list2.sort()

    pprint.pprint(list(approximate_matches(list1, list2)))

main()