Python 3.x 在Python3中,在两个非常大的元组列表中查找公共元组索引的最有效方法是什么?

Python 3.x 在Python3中,在两个非常大的元组列表中查找公共元组索引的最有效方法是什么?,python-3.x,performance,Python 3.x,Performance,我有两大元组列表:“neg”(长度~40K)和“All”(长度~2M),其代理可以从以下链接下载 我想在“All”中搜索“neg”,并在“All”中返回匹配的索引。我在一台功能相当强大的pc上尝试了以下解决方案,耗时788.1487秒(见下面的规格)。而且,它没有保留正确的顺序 事实上,以下代码在202.6451秒内完成了所需的工作。它能做得更快吗 def findTupleIndices(smallList, bigList): comList = sorted(set(smallLis

我有两大元组列表:“neg”(长度~40K)和“All”(长度~2M),其代理可以从以下链接下载

我想在“All”中搜索“neg”,并在“All”中返回匹配的索引。我在一台功能相当强大的pc上尝试了以下解决方案,耗时788.1487秒(见下面的规格)。而且,它没有保留正确的顺序

事实上,以下代码在202.6451秒内完成了所需的工作。它能做得更快吗

def findTupleIndices(smallList, bigList):
 comList = sorted(set(smallList) & set(bigList), key=smallList.index)
 idx = [bigList.index(x) for x in comList]
 return(idx)
电脑规格

Intel(R)Core(TM)i7-5930K CPU@3.50GHz,32GB RAM DDR4-2133 MHz

具有临时哈希表(dict),其中“大列表”元组是键及其索引值

初始进近统计:

from timeit import timeit

def findTupleIndices(sub_lst, search_lst):
    comList = sorted(set(sub_lst) & set(search_lst), key=sub_lst.index)
    idx = [search_lst.index(x) for x in comList]
    return idx   

# sub_lst, search_lst are lists of tuples extracted from `ftp://ftp.lrz.de/transfer/List_Intersect/`

print(timeit('findTupleIndices(sub_lst, search_lst)', 'from __main__ import findTupleIndices, sub_lst, search_lst', number=1000))
from timeit import timeit

def find_tuple_indices(sub_lst, search_lst):
    pos_dict = dict((t,i) for i, t in enumerate(search_lst))
    return [pos_dict[t] for i, t in enumerate(sub_lst) if t in pos_dict]

# sub_lst, search_lst are lists of tuples extracted from `ftp://ftp.lrz.de/transfer/List_Intersect/`

print(timeit('find_tuple_indices(sub_lst, search_lst)', 'from __main__ import find_tuple_indices, sub_lst, search_lst', number=1000))
输出:

191.43023270001868
1.4070011030125897

新方法统计:

from timeit import timeit

def findTupleIndices(sub_lst, search_lst):
    comList = sorted(set(sub_lst) & set(search_lst), key=sub_lst.index)
    idx = [search_lst.index(x) for x in comList]
    return idx   

# sub_lst, search_lst are lists of tuples extracted from `ftp://ftp.lrz.de/transfer/List_Intersect/`

print(timeit('findTupleIndices(sub_lst, search_lst)', 'from __main__ import findTupleIndices, sub_lst, search_lst', number=1000))
from timeit import timeit

def find_tuple_indices(sub_lst, search_lst):
    pos_dict = dict((t,i) for i, t in enumerate(search_lst))
    return [pos_dict[t] for i, t in enumerate(sub_lst) if t in pos_dict]

# sub_lst, search_lst are lists of tuples extracted from `ftp://ftp.lrz.de/transfer/List_Intersect/`

print(timeit('find_tuple_indices(sub_lst, search_lst)', 'from __main__ import find_tuple_indices, sub_lst, search_lst', number=1000))
输出:

191.43023270001868
1.4070011030125897

哇!我正在切换到python