Python:检查两个列表之间字符串的部分匹配
我有两个列表,如下所示:Python:检查两个列表之间字符串的部分匹配,python,string,list,Python,String,List,我有两个列表,如下所示: c = ['John', 'query 989877 forcast', 'Tamm'] isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju'] 我想用c中的每个项目检查isl中的每个项目,以便获得所有部分字符串匹配。 我需要的输出如下所示: out = ["john", "query 989877", "tamm"] out = [] for word in c: for w in isl:
c = ['John', 'query 989877 forcast', 'Tamm']
isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju']
我想用c
中的每个项目检查isl
中的每个项目,以便获得所有部分字符串匹配。
我需要的输出如下所示:
out = ["john", "query 989877", "tamm"]
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
print [word for word in c if word.lower() in (e.lower() for e in isl)]
可以看出,我也得到了部分字符串匹配
我尝试了以下方法:
out = ["john", "query 989877", "tamm"]
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
print [word for word in c if word.lower() in (e.lower() for e in isl)]
但这只给了我
out = ["John", "Tamm"]
我还尝试了以下方法:
out = ["john", "query 989877", "tamm"]
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
print [word for word in c if word.lower() in (e.lower() for e in isl)]
但这只输出“约翰”。
我怎样才能得到我想要的?也许是这样的:
def get_sub_strings(s):
words = s.split()
for i in xrange(1, len(words)+1): #reverse the order here
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
...
>>> out = []
>>> for word in c:
for sub in get_sub_strings(word.lower()):
for s in isl:
if sub in s.lower():
out.append(sub)
...
>>> out
['john', 'query', '989877', 'query 989877', 'tamm']
如果只想存储最大的匹配项,则需要按相反顺序生成子字符串,并在isl
中找到匹配项后立即中断:
def get_sub_strings(s):
words = s.split()
for i in xrange(len(words)+1, 0, -1):
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
out = []
for word in c:
for sub in get_sub_strings(word.lower()):
if any(sub in s.lower() for s in isl):
out.append(sub)
break
print out
#['john', 'query 989877', 'tamm']
好吧,我想到了这个!这是一种非常老套的方法;我自己不喜欢这个方法,但它提供了我的输出:
Step1:
in: c1 = []
for r in c:
c1.append(r.split())
out: c1 = [['John'], ['query', '989877', 'forcast'], ['Tamm']]
Step2:
in: p = []
for w in isl:
for word in c1:
for w1 in word:
if w1.lower() in w.lower():
p.append(w1)
out: p = ['query', '989877', 'John', 'Tamm']
Step3:
in: out = []
for word in c:
t = []
for i in p:
if i in word:
t.append(i)
out.append(t)
out: out = [['John'], ['query', '989877'], ['Tamm']]
Step4:
in: out_final = []
for i in out:
out_final.append(" ".join(e for e in i))
out: out_final = ['John', 'query 989877', 'Tamm']
它必须是“查询9877”,还是可以是“查询”,“989877”?是的…我想要所有匹配项(部分和全部),这实际上非常棒!是否要从“输出”列表中删除“查询”和“989877”?因为理想情况下,它们不应该出现在输出中。我坚持这一点的原因是,我以后需要对“out”列表中的所有元素进行计数。如果我像您所示那样保留输出,这将导致错误的答案。@user1452759检查我的第二个解决方案。非常感谢!这太完美了!