Python 通过使用numpy或TABLART将所有项目相互比较来筛选两个列表_Python_Arrays_Numpy

Python 通过使用numpy或TABLART将所有项目相互比较来筛选两个列表

python arrays numpy

Python 通过使用numpy或TABLART将所有项目相互比较来筛选两个列表,python,arrays,numpy,Python,Arrays,Numpy,我有两个元组列表，其中每个列表中的元组都是唯一的。列表的格式如下： [('col1', 'col2', 'col3', 'col4'), ...] 我使用嵌套循环从两个列表中查找成员，它们对于给定的col、col2和col3具有相同的值 temp1 = set([]) temp2 = set([]) for item1 in list1: for item2 in list2: if item1['col2'] == item2['col2'] and \

我有两个元组列表，其中每个列表中的元组都是唯一的。列表的格式如下：

[('col1', 'col2', 'col3', 'col4'), ...]

我使用嵌套循环从两个列表中查找成员，它们对于给定的col、col2和col3具有相同的值

temp1 = set([])
temp2 = set([])
for item1 in list1:
    for item2 in list2:
        if item1['col2'] == item2['col2'] and \
            item1['col3'] == item2['col3']:
            temp1.add(item1)
            temp2.add(item2)

只是工作而已。但是，当列表中有上万个项目时，需要花费很多分钟才能完成

使用表格，我可以为列表2过滤一个项目的列表1 agianst col2、col3，如下所示：

list1 = tb.tabular(records=[...], names=['col1','col2','col3','col4'])
...

for (col1, col2, col3, col4) in list2:
    list1[(list1['col2'] == col2) & (list1['col3'] == col3)]

这显然是“做错了”，而且比第一次慢得多

如何使用numpy或tabular有效地检查元组列表中的项与另一元组的所有项

谢谢

试试这个：

temp1 = set([])
temp2 = set([])

dict1 = dict()
dict2 = dict()

for key, value in zip([tuple(l[1:3]) for l in list1], list1):
    dict1.setdefault(key, list()).append(value)

for key, value in zip([tuple(l[1:3]) for l in list2], list2):
    dict2.setdefault(key, list()).append(value)

for key in dict1:
    if key in dict2:
        temp1.update(dict1[key])
        temp2.update(dict2[key])

脏的一个，但应该工作。

如何使用numpy或tabular有效地检查元组列表中的项与另一个元组的所有项

嗯，我没有使用表格的经验，而且很少使用numpy，所以我不能给你一个精确的“罐装”解决方案。但我想我可以给你指明正确的方向。如果列表1的长度为X，列表2的长度为Y，则进行X*Y检查……而您只需要进行X+Y检查

您应该执行以下操作（我将假设这些是常规Python元组的列表，而不是表格记录，我相信您可以进行必要的调整）：

我将创建一个tuple的子类，它具有特殊的

\uuuuueq\uuuuu

和

\uuuuuuuuuuuuu散列方法：
>>> class SpecialTuple(tuple):
...     def __eq__(self, t):
...             return self[1] == t[1] and self[2] == t[2]
...     def __hash__(self):
...             return hash((self[1], self[2]))
... 

它比较了col1
和col2
，并指出在这些列相同的条件下，元组是相等的
然后，只需在这个特殊元组上使用set
交集进行过滤：
>>> list1 = [ (0, 1, 2, 0), (0, 3, 4, 0), (1, 2, 3, 12) ]
>>> list2 = [ (0, 1, 1, 0), (0, 3, 9, 9), (42, 2, 3, 12) ]
>>> set(map(SpecialTuple, list1)) & set(map(SpecialTuple, list2))
set([(42, 2, 3, 12)])

我不知道它有多快。告诉我。：） 很好，谢谢。为了进行测试，我使用了两个10000元组的列表，每个元组有4个随机整数。嵌套循环花费了59.042543888秒，而你的循环花费了0.13046002388秒。你说得对，我完全没有抓住比较的要点：）谢谢。散列mkaes set函数覆盖（？）集合中的现有项，这会导致较小的结果：）
>>> list1 = [ (0, 1, 2, 0), (0, 3, 4, 0), (1, 2, 3, 12) ]
>>> list2 = [ (0, 1, 1, 0), (0, 3, 9, 9), (42, 2, 3, 12) ]
>>> set(map(SpecialTuple, list1)) & set(map(SpecialTuple, list2))
set([(42, 2, 3, 12)])