Python 跨列表列表比较子列表的有效方法？_Python_Algorithm

Python 跨列表列表比较子列表的有效方法？

python algorithm

Python 跨列表列表比较子列表的有效方法？,python,algorithm,Python,Algorithm,很抱歉标题措辞不当，这说明了谷歌搜索失败，如果不是我的借口的话。。。。希望解释清楚从现在开始，我有两个列表线列表，我想根据一些标准检查列表中的两行是否匹配。我知道的唯一方法是在双for循环中进行，这很慢。首先对列表排序不是一个选项，因为列表之间并不完全一致编辑：好的，下面是这两个列表的样子。它们大约有10000个子列表长；我尽力模仿它们的特征和相互之间的关系不同的长度，大多数子列表都有它们的匹配日期和其他列表中的2个大写元素，尽管下面没有 ['xyz', 'xyz', '12/11/20

很抱歉标题措辞不当，这说明了谷歌搜索失败，如果不是我的借口的话。。。。希望解释清楚

从现在开始，我有两个列表线列表，我想根据一些标准检查列表中的两行是否匹配。我知道的唯一方法是在双for循环中进行，这很慢。首先对列表排序不是一个选项，因为列表之间并不完全一致

编辑：

好的，下面是这两个列表的样子。它们大约有10000个子列表长；我尽力模仿它们的特征和相互之间的关系不同的长度，大多数子列表都有它们的匹配日期和其他列表中的2个大写元素，尽管下面没有

['xyz', 'xyz', '12/11/2006', 'Zatgxg', 'Fuietg', '3'],  
['xyz', 'xyz', '23/04/2011', 'Gcatia', 'Cecfoz', '0'],  
['xyz', 'xyz', '08/03/2003', 'Fuietg', 'Erzhgg', '2'],  
['xyz', 'xyz', '07/05/2006', 'Aapoaa', 'Fuietg', '1'],  
['xyz', 'xyz', '15/05/2004', 'Bfaext', 'Eghege', '1'],   
['xyz', 'xyz', '05/02/2006', 'Gtoadr', 'Udpfdf', '1'],  
['xyz', 'xyz', '11/09/2004', 'Racdgo', 'Zchxgx', '0'],  
['xyz', 'xyz', '03/04/2011', 'Zdcfii', 'Rhiiog', '1'],  
['xyz', 'xyz', '07/04/2007', 'Dabzgi', 'Gpeiot', '4'],  
['xyz', 'xyz', '16/03/2008', 'Dohbur', 'Oucegh', '2']
##################

['xyz', 'xyz', 'Dohbur', 'Oucegh', 'xyz', 'xyz', '16/03/2008'],  
['xyz', 'xyz', 'Dabzgi', 'Gpeiot', 'xyz', 'xyz', '07/04/2007'],  
['xyz', 'xyz', 'Fuietg', 'Erzhgg', 'xyz', 'xyz', '08/03/2003'],  
['xyz', 'xyz', 'Udioac', 'Gceabb', 'xyz', 'xyz', '21/02/2004'],  
['xyz', 'xyz', 'Bfaext', 'Eghege', 'xyz', 'xyz', '15/05/2004'],  
['xyz', 'xyz', 'Racdgo', 'Zchxgx', 'xyz', 'xyz', '11/09/2004'],  
['xyz', 'xyz', 'Gtoadr', 'Udpfdf', 'xyz', 'xyz', '05/02/2006'],  
['xyz', 'xyz', 'Aapoaa', 'Fuietg', 'xyz', 'xyz', '07/05/2006']

从你的问题中，我推断你有两个相同大小的列表，并且以某种方式排序，不能改变

您想知道列表A中的项目是否存在于列表B中

您可以进行排序，保留对初始索引的引用，这使您可以更快地使用对分进行搜索

看

numpy.argsort也这样做

编辑从您的评论中，可以看到一个简单的解决方案

我想指出，这是非常高效的，因为测试和插入一个集合是O1操作，使整个操作在+m上

在本例中，列表已排序，但这只是巧合

根据示例进行更多编辑如果您只需要匹配的元素，请对上一版本进行简单修订：

list(set([x[0:4] for x in list1]).intersection([(x[0], x[1], x[2], x[3], x[6]) for x in list2]))

理解将按预期顺序将列表1和列表2简化为公共元素。如果您需要两个列表中的所有元素，则可能会稍微复杂一些。

创建一个哈希函数，当字符串的条件匹配时，该函数的哈希值匹配

然后将一组列表插入具有该哈希的哈希表中，并使用相同的哈希查找字符串

例如，假设您需要匹配第1列和第2列，同时创建1和2的哈希。

您需要对每个矩阵中的行进行规格化，然后可以使用规格化值的集合交点进行匹配。anorm和bnorm函数被定义为从任何一个列表生成一个元组，该元组将匹配匹配的行。如果多行匹配，那么代码也应该处理这个问题

>>> alist = [['xyz', 'xyz', '12/11/2006', 'Zatgxg', 'Fuietg', '3'],  
['xyz', 'xyz', '23/04/2011', 'Gcatia', 'Cecfoz', '0'],  
['xyz', 'xyz', '08/03/2003', 'Fuietg', 'Erzhgg', '2'],  
['xyz', 'xyz', '07/05/2006', 'Aapoaa', 'Fuietg', '1'],  
['xyz', 'xyz', '15/05/2004', 'Bfaext', 'Eghege', '1'],   
['xyz', 'xyz', '05/02/2006', 'Gtoadr', 'Udpfdf', '1'],  
['xyz', 'xyz', '11/09/2004', 'Racdgo', 'Zchxgx', '0'],  
['xyz', 'xyz', '03/04/2011', 'Zdcfii', 'Rhiiog', '1'],  
['xyz', 'xyz', '07/04/2007', 'Dabzgi', 'Gpeiot', '4'],  
['xyz', 'xyz', '16/03/2008', 'Dohbur', 'Oucegh', '2']]
>>> blist = [['xyz', 'xyz', 'Dohbur', 'Oucegh', 'xyz', 'xyz', '16/03/2008'],  
['xyz', 'xyz', 'Dabzgi', 'Gpeiot', 'xyz', 'xyz', '07/04/2007'],  
['xyz', 'xyz', 'Fuietg', 'Erzhgg', 'xyz', 'xyz', '08/03/2003'],  
['xyz', 'xyz', 'Udioac', 'Gceabb', 'xyz', 'xyz', '21/02/2004'],  
['xyz', 'xyz', 'Bfaext', 'Eghege', 'xyz', 'xyz', '15/05/2004'],  
['xyz', 'xyz', 'Racdgo', 'Zchxgx', 'xyz', 'xyz', '11/09/2004'],  
['xyz', 'xyz', 'Gtoadr', 'Udpfdf', 'xyz', 'xyz', '05/02/2006'],  
['xyz', 'xyz', 'Aapoaa', 'Fuietg', 'xyz', 'xyz', '07/05/2006']]
>>> from collections import defaultdict
>>> def anorm(line): return (line[0], line[2], line[4].upper())

>>> def bnorm(line): return (line[1], line[6], line[3].upper())

>>> def fitting(a, b, an, bn):
    anormalized, bnormalized = defaultdict(list), defaultdict(list)
    for i, line in enumerate(a):
        anormalized[an(line)].append(i)
    for i, line in enumerate(b):
        bnormalized[bn(line)].append(i)
    common = set(anormalized).intersection(set(bnormalized))
    for norm in common:
        print('lines at indices %r of a and %r of b are common'
              % (anormalized[norm], bnormalized[norm]))


>>> fitting(alist, blist, anorm, bnorm)
lines at indices [3] of a and [7] of b are common
lines at indices [5] of a and [6] of b are common
lines at indices [2] of a and [2] of b are common
lines at indices [9] of a and [0] of b are common
lines at indices [8] of a and [1] of b are common
lines at indices [6] of a and [5] of b are common
lines at indices [4] of a and [4] of b are common
>>>

为了验证，这些列表中的元素可以比较是否相等，但不能排序？@sharth它们可以排序，只是我不认为这是一个问题solution@jamylak数据量？这实际上是一个简单的数据结构：两个列表都引用了相同的填充或事件作为子列表，每个都只包含字符串-只是一个列表缺少了一些内容，而另一个缺少了自己的部分…一些标准是什么？@Janne Karila return li1_子列表[2]==li2_子列表[6]和li1_子列表[3:5]==li2_子列表[2:4]在def sameli1_子列表、li2_子列表之后：列表的长度不同，顺序也不重要。我想知道列表A中的一个项目是否与列表B中的任何项目相似，意思是它们是否共享特定元素，并且在这种情况下对这两个项目都做了些什么。将它们添加到列表C或其他任何内容中。我不理解这一点，首先对列表进行排序并不是一个错误的表达。我不认为它是解决方案的一部分，也不禁止它本身作为一个选项。但是，那个解决方案是针对平面列表的，因为列表是不可散列的？但是，我会尝试把所有的子列表都转换成元组，看看它会变成什么样子。我不理解你的评论。也许你可以发布两个这样的列表的例子？

>>> alist = [['xyz', 'xyz', '12/11/2006', 'Zatgxg', 'Fuietg', '3'],  
['xyz', 'xyz', '23/04/2011', 'Gcatia', 'Cecfoz', '0'],  
['xyz', 'xyz', '08/03/2003', 'Fuietg', 'Erzhgg', '2'],  
['xyz', 'xyz', '07/05/2006', 'Aapoaa', 'Fuietg', '1'],  
['xyz', 'xyz', '15/05/2004', 'Bfaext', 'Eghege', '1'],   
['xyz', 'xyz', '05/02/2006', 'Gtoadr', 'Udpfdf', '1'],  
['xyz', 'xyz', '11/09/2004', 'Racdgo', 'Zchxgx', '0'],  
['xyz', 'xyz', '03/04/2011', 'Zdcfii', 'Rhiiog', '1'],  
['xyz', 'xyz', '07/04/2007', 'Dabzgi', 'Gpeiot', '4'],  
['xyz', 'xyz', '16/03/2008', 'Dohbur', 'Oucegh', '2']]
>>> blist = [['xyz', 'xyz', 'Dohbur', 'Oucegh', 'xyz', 'xyz', '16/03/2008'],  
['xyz', 'xyz', 'Dabzgi', 'Gpeiot', 'xyz', 'xyz', '07/04/2007'],  
['xyz', 'xyz', 'Fuietg', 'Erzhgg', 'xyz', 'xyz', '08/03/2003'],  
['xyz', 'xyz', 'Udioac', 'Gceabb', 'xyz', 'xyz', '21/02/2004'],  
['xyz', 'xyz', 'Bfaext', 'Eghege', 'xyz', 'xyz', '15/05/2004'],  
['xyz', 'xyz', 'Racdgo', 'Zchxgx', 'xyz', 'xyz', '11/09/2004'],  
['xyz', 'xyz', 'Gtoadr', 'Udpfdf', 'xyz', 'xyz', '05/02/2006'],  
['xyz', 'xyz', 'Aapoaa', 'Fuietg', 'xyz', 'xyz', '07/05/2006']]
>>> from collections import defaultdict
>>> def anorm(line): return (line[0], line[2], line[4].upper())

>>> def bnorm(line): return (line[1], line[6], line[3].upper())

>>> def fitting(a, b, an, bn):
    anormalized, bnormalized = defaultdict(list), defaultdict(list)
    for i, line in enumerate(a):
        anormalized[an(line)].append(i)
    for i, line in enumerate(b):
        bnormalized[bn(line)].append(i)
    common = set(anormalized).intersection(set(bnormalized))
    for norm in common:
        print('lines at indices %r of a and %r of b are common'
              % (anormalized[norm], bnormalized[norm]))


>>> fitting(alist, blist, anorm, bnorm)
lines at indices [3] of a and [7] of b are common
lines at indices [5] of a and [6] of b are common
lines at indices [2] of a and [2] of b are common
lines at indices [9] of a and [0] of b are common
lines at indices [8] of a and [1] of b are common
lines at indices [6] of a and [5] of b are common
lines at indices [4] of a and [4] of b are common
>>>