Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 比较两个包含字符串和子列表的列表_Python_Python 3.x_Python 3.6 - Fatal编程技术网

Python 比较两个包含字符串和子列表的列表

Python 比较两个包含字符串和子列表的列表,python,python-3.x,python-3.6,Python,Python 3.x,Python 3.6,我陷入了这样一种情况:我必须比较一个列表列表,其中每个子列表包含两个字符串和一个子列表。我想将每个子列表与下一个子列表进行比较,并在第三项(子列表)中记录它们的第一个字符串和匹配的标识符。看起来有点混乱。以下是一个例子: 我有以下清单: node = [['1001', '2008-01-06T02:12:13Z', ['']], ['1002', '2008-01-06T02:13:55Z', ['']], ['1003', '2008-01-06T02:

我陷入了这样一种情况:我必须比较一个列表列表,其中每个子列表包含两个字符串和一个子列表。我想将每个子列表与下一个子列表进行比较,并在第三项(子列表)中记录它们的第一个字符串和匹配的标识符。看起来有点混乱。以下是一个例子: 我有以下清单:

node = [['1001', '2008-01-06T02:12:13Z', ['']], 
        ['1002', '2008-01-06T02:13:55Z', ['']],  
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']], 
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']], 
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']], 
        ['1006', '2008-01-06T02:13:30Z', ['']], 
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
每个子列表的第一项是ID,第二项是时间戳,第三项(子列表)包含成员。我想比较成员,如果两个子列表包含相同的成员,我想将它们与ID一起存储在一个新列表中,如下所示

output-list = [['1003', '1004', ['Lion', 'Leopard', 'Panda']], 
               ['1003', '1005', ['Lion', 'Panda']], 
               ['1004', '1005', ['Lion', 'Panda', 'Tiger']], 
               ['1004', '1007', ['Tiger']], 
               ['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]]

我的头脑不知道如何做双for循环或任何其他方法。有人能帮我吗?很抱歉,我无法生成一个尝试性代码。

您可以为每个列表计算md5哈希值并进行比较,就像校验和一样

node_md5hash=hashlib.md5(bencode.bencode(node)).hexdigest()
output-list_md5hash=hashlib.md5(bencode.bencode(输出列表)).hexdigest()
它将为节点和输出列表提供md5哈希,如果哈希相同,那么它们的值也相同


您将需要导入hashlib库和bencode库(您可能需要pip安装bencode)。您可以为每个列表计算md5哈希并进行比较,就像校验和一样

node_md5hash=hashlib.md5(bencode.bencode(node)).hexdigest()
output-list_md5hash=hashlib.md5(bencode.bencode(输出列表)).hexdigest()
它将为节点和输出列表提供md5哈希,如果哈希相同,那么它们的值也相同


您需要导入hashlib库和bencode库(您可能需要pip安装bencode)。如果匹配列表中的顺序很重要,这里是最简单的方法

>>> out  = []
>>> for ii, elem in enumerate(node[:-1]):                                                                                                            
...     for jj in range(ii + 1, len(node)):                                                                                                          
...         common = [subelem for subelem in elem[-1] if subelem in node[jj][-1]]
...         if len(common) > 0 and common != ['']:
...             out.append([elem[0], node[jj][0], common])                                                                                       
... 
>>> for elem in out:
...     print elem
... 
['1003', '1004', ['Lion', 'Leopard', 'Panda']]
['1003', '1005', ['Lion', 'Panda']]
['1004', '1005', ['Lion', 'Panda', 'Tiger']]
['1004', '1007', ['Tiger']]
['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]
如果顺序不重要且列表很大,则使用双循环中的第一行设置交叉点,如下所示

common = list(set(elem[-1]).intersection(set(node[jj][-1])))

如果匹配列表中的顺序很重要,下面是最简单的方法

>>> out  = []
>>> for ii, elem in enumerate(node[:-1]):                                                                                                            
...     for jj in range(ii + 1, len(node)):                                                                                                          
...         common = [subelem for subelem in elem[-1] if subelem in node[jj][-1]]
...         if len(common) > 0 and common != ['']:
...             out.append([elem[0], node[jj][0], common])                                                                                       
... 
>>> for elem in out:
...     print elem
... 
['1003', '1004', ['Lion', 'Leopard', 'Panda']]
['1003', '1005', ['Lion', 'Panda']]
['1004', '1005', ['Lion', 'Panda', 'Tiger']]
['1004', '1007', ['Tiger']]
['1005', '1007', ['Cheetah', 'Goat', 'Tiger']]
如果顺序不重要且列表很大,则使用双循环中的第一行设置交叉点,如下所示

common = list(set(elem[-1]).intersection(set(node[jj][-1])))
还有一种方法:

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
# Use this list for result
result = []

def city_exists(city, cities):
    """ Just a helper to verify if city already used """
    for c in cities:
        if c[1] == city:
            return True
    return False

# And finally, iterate and add to the resulting list
for item in node:
    for city in item[2]:
        if not city_exists(city, result):
            result.append([item[0], city])

# Print out the result
print(result)
还有一种方法:

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]
# Use this list for result
result = []

def city_exists(city, cities):
    """ Just a helper to verify if city already used """
    for c in cities:
        if c[1] == city:
            return True
    return False

# And finally, iterate and add to the resulting list
for item in node:
    for city in item[2]:
        if not city_exists(city, result):
            result.append([item[0], city])

# Print out the result
print(result)

看起来您正在寻找的是python附带的itertools.compositions

di={i[0]:set(i[2]) for i in node};outputlist=[]
for i,j in itertools.combinations(di.keys(),2):
    union=list(di[i].intersection(di[j]))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i,j,union])
您甚至可以跳过di阶段,跳转到组合

outputlist=[]
for i,j in itertools.combinations(node,2):
    union=list(set(i[2]).intersection(set(j[2])))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i[0], j[0],union])
此外,我建议将动物作为一个集合,并将空列表指定为python空列表

编辑 如果你坚持列表,你最好使用

union=filter(lambda x:x in i[2],j[2])
因为类型更改有点不有效

一切归结为
看起来您正在寻找的是python附带的itertools.compositions

di={i[0]:set(i[2]) for i in node};outputlist=[]
for i,j in itertools.combinations(di.keys(),2):
    union=list(di[i].intersection(di[j]))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i,j,union])
您甚至可以跳过di阶段,跳转到组合

outputlist=[]
for i,j in itertools.combinations(node,2):
    union=list(set(i[2]).intersection(set(j[2])))
    if union and not union[0]=='':#makes sure it is not an empty set and that it does not contain only empty lists
        outputlist.append([i[0], j[0],union])
此外,我建议将动物作为一个集合,并将空列表指定为python空列表

编辑 如果你坚持列表,你最好使用

union=filter(lambda x:x in i[2],j[2])
因为类型更改有点不有效

一切归结为
我认为解决问题的最佳方法是使用
itertools
中的
组合
,将列表之间的交集转换为dicts模块,如以下示例所示:

from itertools import combinations

def compare(node, grouping=2):
    for elm1, elm2 in combinations(node, grouping):
        condition = set(elm1[-1]) & set(elm2[-1])
        if bool(condition) and condition != {''}:
            yield elm1[0], elm2[0], list(condition)

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]

final = list(compare(node))
print(final)
输出:

[['1003', '1004', ['Lion', 'Leopard', 'Panda']],
 ['1003', '1005', ['Lion', 'Panda']],
 ['1004', '1005', ['Lion', 'Tiger', 'Panda']],
 ['1004', '1007', ['Tiger']],
 ['1005', '1007', ['Goat', 'Tiger', 'Cheetah']]]

我认为解决问题的最佳方法是使用
itertools
中的
组合
,将列表之间的交集转换为dicts模块,如以下示例所示:

from itertools import combinations

def compare(node, grouping=2):
    for elm1, elm2 in combinations(node, grouping):
        condition = set(elm1[-1]) & set(elm2[-1])
        if bool(condition) and condition != {''}:
            yield elm1[0], elm2[0], list(condition)

node = [['1001', '2008-01-06T02:12:13Z', ['']],
        ['1002', '2008-01-06T02:13:55Z', ['']],
        ['1003', '2008-01-06T02:13:00Z', ['Lion', 'Rhinoceros', 'Leopard', 'Panda']],
        ['1004', '2008-01-06T02:15:20Z', ['Lion', 'Leopard', 'Eagle', 'Panda', 'Tiger']],
        ['1005', '2008-01-06T02:15:48Z', ['Lion', 'Panda', 'Cheetah', 'Goat', 'Tiger']],
        ['1006', '2008-01-06T02:13:30Z', ['']],
        ['1007', '2008-01-06T02:13:38Z', ['Cheetah', 'Tiger', 'Goat']]]

final = list(compare(node))
print(final)
输出:

[['1003', '1004', ['Lion', 'Leopard', 'Panda']],
 ['1003', '1005', ['Lion', 'Panda']],
 ['1004', '1005', ['Lion', 'Tiger', 'Panda']],
 ['1004', '1007', ['Tiger']],
 ['1005', '1007', ['Goat', 'Tiger', 'Cheetah']]]

把你的代码分解到核心部分,在核心部分进行比较,这就是为什么你很喜欢它。为什么你不把
1001
1002
匹配呢<代码>1001
1006
1002
1006
?所有这些都有一个匹配项:emoty字符串。将代码分解到核心,在那里进行比较,这就是为什么您会感到困惑。为什么不将
1001
1002
匹配<代码>1001
1006
1002
1006
?这些都有一个匹配项:emoty字符串。我更喜欢这个解决方案,而不是逐个元素的解决方案,但是您可以在不包含md5的情况下实现它。Python具有针对不可变类型的内置哈希。您可以只
散列(tuple())
。如果希望列表比较独立于顺序,请使用
hash(tuple(sorted())
。我应该补充一点,即内置的hash函数只保证在运行的内核会话中值一致。如果您需要保存散列,例如保存到磁盘,以便稍后进行比较,那么md5是一个更好的选择。我更喜欢此解决方案,而不是逐个元素的解决方案,但您可以在不包含md5的情况下完成此操作。Python具有针对不可变类型的内置哈希。您可以只
散列(tuple())
。如果希望列表比较独立于顺序,请使用
hash(tuple(sorted())
。我应该补充一点,即内置的hash函数只保证在运行的内核会话中值一致。如果您需要保存哈希,例如保存到磁盘,以便以后进行比较,那么md5是一个更好的选择。