Python 删除字符串列表中类似的元素_Python_Arrays_String_List_Fuzzing

Python 删除字符串列表中类似的元素

python arrays string list

Python 删除字符串列表中类似的元素,python,arrays,string,list,fuzzing,Python,Arrays,String,List,Fuzzing,这是我第一次在这里问一些问题，我对这个很陌生，所以我会尽我所能。我有一个短语列表，我想删除所有相同的短语，如： array = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different",

这是我第一次在这里问一些问题，我对这个很陌生，所以我会尽我所能。我有一个短语列表，我想删除所有相同的短语，如：

array = ["A very long string saying some things", 
         "Another long string saying some things", 
         "extremely large string saying some things", 
         "something different", 
         "this is a test"]

我想要这个结果：

array2 = ["A very long string saying some things", 
          "something different", 
          "this is a test"]`

我有这个：

for i in range(len(array)):
    swich=True
    for j in range(len(array2)):
        if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == True):
            swich=False
            pass
        if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == False):
            array2.pop(j)

但它给了我列表

索引器
fuzzy.ratio
比较两个字符串并给出一个介于0和100之间的值，值越大，字符串越相似
我要做的是逐元素比较列表，当它第一次发现两个相似的字符串时，打开开关并传递，从这一点开始，每次相似的发现都会弹出array2
的元素。我完全愿意接受任何建议。
您得到的错误是由修改列表引起的，此时您正在对列表进行迭代。（切勿添加/删除/替换您当前迭代的iterables元素！）range（len（array2））
知道长度是N，但在您array2.pop（j）
之后，长度不再是N，而是N-1。之后尝试访问第N个元素时，会得到一个索引器
，因为列表现在更短了
另一种方法的快速猜测：
original = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]

filtered = list()

for original_string in original:
    include = True
    for filtered_string in filtered:
        if fuzz.ratio(original_string, filtered_string) >= 80:
            include = False
            break
    if include:
        filtered.append(original_string)

请注意对于数组中的字符串
循环，它更“pythonic”，不需要整数变量或范围。
使用另一个库来压缩代码并减少循环的数量如何
import difflib

def remove_similar_words(word_list):
    for elem in word_list:
        first_pass = difflib.get_close_matches(elem, word_list)
        if len(first_pass) > 1:
            word_list.remove(first_pass[-1])
            remove_similar_words(word_list)
    return word_list


l = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]

remove_similar_words(l)

['A very long string saying some things',
 'something different',
 'this is a test']

给出准确的错误跟踪…哪个列表有索引错误？