Python 删除字符串列表中类似的元素
这是我第一次在这里问一些问题,我对这个很陌生,所以我会尽我所能。我有一个短语列表,我想删除所有相同的短语,如:Python 删除字符串列表中类似的元素,python,arrays,string,list,fuzzing,Python,Arrays,String,List,Fuzzing,这是我第一次在这里问一些问题,我对这个很陌生,所以我会尽我所能。我有一个短语列表,我想删除所有相同的短语,如: array = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different",
array = ["A very long string saying some things",
"Another long string saying some things",
"extremely large string saying some things",
"something different",
"this is a test"]
我想要这个结果:
array2 = ["A very long string saying some things",
"something different",
"this is a test"]`
我有这个:
for i in range(len(array)):
swich=True
for j in range(len(array2)):
if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == True):
swich=False
pass
if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == False):
array2.pop(j)
但它给了我列表索引器
fuzzy.ratio
比较两个字符串并给出一个介于0和100之间的值,值越大,字符串越相似
我要做的是逐元素比较列表,当它第一次发现两个相似的字符串时,打开开关并传递,从这一点开始,每次相似的发现都会弹出array2
的元素。我完全愿意接受任何建议。您得到的错误是由修改列表引起的,此时您正在对列表进行迭代。(切勿添加/删除/替换您当前迭代的iterables元素!)range(len(array2))
知道长度是N,但在您array2.pop(j)
之后,长度不再是N,而是N-1。之后尝试访问第N个元素时,会得到一个索引器
,因为列表现在更短了
另一种方法的快速猜测:
original = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]
filtered = list()
for original_string in original:
include = True
for filtered_string in filtered:
if fuzz.ratio(original_string, filtered_string) >= 80:
include = False
break
if include:
filtered.append(original_string)
请注意对于数组中的字符串
循环,它更“pythonic”,不需要整数变量或范围。使用另一个库来压缩代码并减少循环的数量如何
import difflib
def remove_similar_words(word_list):
for elem in word_list:
first_pass = difflib.get_close_matches(elem, word_list)
if len(first_pass) > 1:
word_list.remove(first_pass[-1])
remove_similar_words(word_list)
return word_list
l = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]
remove_similar_words(l)
['A very long string saying some things',
'something different',
'this is a test']
给出准确的错误跟踪…哪个列表有索引错误?