在python中高效地搜索字符串列表以查找字符串列表
我有一个字符串列表和一个字符串列表。例如:在python中高效地搜索字符串列表以查找字符串列表,python,list,Python,List,我有一个字符串列表和一个字符串列表。例如: L1=[["cat","dog","apple"],["orange","green","red"]] L2=["cat","red"] 如果L1[i]包含L2中的任何项目,我需要放置对(用于在图形中创建边) 比如,在我的例子中,我需要成对的(“猫”,“狗”),(“猫,苹果”),(“红,橙”),(“红”,“绿”) 我应该使用什么方法使其最有效。(我的列表L1很大)我建议将它们全部转换为sets,并使用set操作(交集)来计算L2中的哪些术语在每个L1
L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]
如果L1[i]包含L2中的任何项目,我需要放置对(用于在图形中创建边)
比如,在我的例子中,我需要成对的(“猫”,“狗”),(“猫,苹果”),(“红,橙”),(“红”,“绿”)
我应该使用什么方法使其最有效。(我的列表L1很大)我建议将它们全部转换为
set
s,并使用set操作(交集)来计算L2中的哪些术语在每个L1项中。然后,可以使用集合减法获得需要配对的项目列表
edges = []
L2set = set(L2)
for L1item in L1:
L1set = set(L1item)
items_in_L1item = L1set & L2set
for item in items_in_L1item:
items_to_pair = L1set - set([item])
edges.extend((item, i) for i in items_to_pair)
要使此代码即使在
L1
和L2
很大的情况下也是最佳的,请使用它生成一个生成器,而不是创建一个巨大的元组列表。如果你在Python3中工作,只需使用
代码很容易理解,几乎是纯英语!首先,循环遍历每个列表及其对应的元素,然后询问该元素是否在列表中,如果在列表中,则打印除该对(x,x)之外的所有对
输出:
[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]
假设在L1子列表中可能有多个“控制”项 我会使用和: 例如:
>>> L1 = [["cat","dog","apple"],
... ["orange","green","red"],
... ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
('dog', 'cat'),
('orange', 'red'),
('green', 'red'),
('hand', 'red'),
('hand', 'cat')]
如果L1非常大,您可能需要考虑使用对分。它要求您首先对L1进行展平和排序。你可以这样做:
from bisect import bisect_left, bisect_right
from itertools import chain
L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]
M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]
for item in L2:
s = bisect_left(L1flat, item)
e = bisect_right(L1flat, item)
print item, M1[s:e]
#apple [0, 1]
#cat [0]
#red [1]
您是否直接尝试过(可能效率较低)?您的代码不适用于OP所寻找的一般情况。因为L2中的任何元素都可能位于L1中的任何列表中,并且L1中的列表数量可能与L2中的元素数量不同。
>>> L1 = [["cat","dog","apple"],
... ["orange","green","red"],
... ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
('dog', 'cat'),
('orange', 'red'),
('green', 'red'),
('hand', 'red'),
('hand', 'cat')]
from bisect import bisect_left, bisect_right
from itertools import chain
L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]
M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]
for item in L2:
s = bisect_left(L1flat, item)
e = bisect_right(L1flat, item)
print item, M1[s:e]
#apple [0, 1]
#cat [0]
#red [1]