Python 是否删除列表中其他字符串的子字符串，而不更改列表的原始顺序？_Python_String_List

Python 是否删除列表中其他字符串的子字符串，而不更改列表的原始顺序？

python string list

Python 是否删除列表中其他字符串的子字符串，而不更改列表的原始顺序？,python,string,list,Python,String,List,我有一张单子 the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?'] 我想从列表中删除“Donald Trump has”，因为它是其他列表元素的子字符串这是一个重要的部分。我想在不改变原始列表顺序的情况下执行此操作我的函数（如下）扭曲了原始列表的顺序。因为它首先按其长度对列表项进行排序 def substr_sieve(list_of_strings): du

我有一张单子

the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?']

我想从列表中删除“Donald Trump has”，因为它是其他列表元素的子字符串

这是一个重要的部分。我想在不改变原始列表顺序的情况下执行此操作

我的函数（如下）扭曲了原始列表的顺序。因为它首先按其长度对列表项进行排序

def substr_sieve(list_of_strings):  
    dups_removed = list_of_strings[:]
    for i in xrange(len(list_of_strings)):
        list_of_strings.sort(key = lambda s: len(s))
        j=0
        j=i+1
        while j <= len(list_of_strings)-1:
            if list_of_strings[i] in list_of_strings[j]:
                try:
                    dups_removed.remove(list_of_strings[i])
                except:
                    pass
            j+=1
    return dups_removed

def substr\u筛（字符串列表）：
dups_removed=字符串列表[：]
对于xrange中的i（len（字符串列表））：
字符串的列表。排序（key=lambda s:len（s））
j=0
j=i+1
而j可以递归地减少项目
算法：
通过弹出每个项目来循环，决定是否需要保留它。
使用缩减列表递归调用同一函数。
基本条件是列表中是否至少有一项（或两项？）
效率：它可能不是最有效的。我认为一些分而治之的方法更合适
the_list = ['Donald Trump has', 'Donald Trump has small fingers',\
            'What is going on?']

final_list = []

def remove_or_append(input):
    if len(input):
        first_value = input.pop(0)
        found = False
        for each in input:
            if first_value in each:
                found = True
                break
            else:
                continue
        for each in final_list:
            if first_value in each:
                found = True
                break
            else:
                continue
        if not found:
            final_list.append(first_value)
        remove_or_append(input)

remove_or_append(the_list)

print(final_list)

稍有不同的版本是：
def substring_of_anything_else(item, list):
    for idx, each in enumerate(list):
        if idx == item[0]:
            continue
        else:
            if item[1] in each:
                return True
        return False

final_list = [item for idx, item in enumerate(the_list)\ 
              if not substring_of_anything_else((idx, item), the_list)]

无需排序即可执行此操作：
the_list = ['Donald Trump has', "I've heard Donald Trump has small fingers",
            'What is going on?']

def winnow(a_list):
    keep = set()
    for item in a_list:
        if not any(item in other for other in a_list if item != other):
            keep.add(item)
    return [ item for item in a_list if item in keep ]

winnow(the_list)

总的来说，排序可能允许较少的比较，但这似乎高度依赖于数据，可能是过早的优化。
一个简单的解决方案
但首先，让我们在最后添加“唐纳德·特朗普”、“唐纳德”和“特朗普”，使其成为更好的测试用例
>>> forbidden_text = "\nX08y6\n" # choose a text that will hardly appear in any sensible string
>>> the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?',
        'Donald Trump', 'Donald', 'Trump']
>>> new_list = [item for item in the_list if forbidden_text.join(the_list).count(item) == 1]
>>> new_list
['Donald Trump has small fingers', 'What is going on?']

逻辑：
连接所有列表元素以形成单个字符串。
禁止的文本。加入（列表）
搜索列表中的项目是否只出现过一次。如果出现多次，则为子字符串。计数（项）=1

返回子字符串sub
在[开始，结束]
范围内不重叠的出现次数。可选参数start
和end
被解释为切片表示法

使用禁止\u text
代替”
（空白字符串）来处理以下情况：
>列表=['DonaldTrump'，'DonaldTrump'，'Donald'，'Trump']


正如Nishant正确指出的那样，上述代码对于列表=['Donald'，'Donald']

使用集合（列表）
而不是列表
解决了这个问题。

>>new\u list=[如果禁止，则列表中的项目对应于项目。加入（设置（项目列表））。计数（项目）==1]
为什么不制作列表的第二个副本进行排序，那么？@jornsharpe我已经制作了一个副本并删除了副本。我想我没听懂你的话。那会有什么帮助？我想你需要使用分而治之的方法Paradigm@ToussaintLouverture是的，但你还在整理原稿。因此，再复制一份进行排序。似乎很相似。在这种情况下，这个方法有效吗？the_list=['Donald Trump'，'Donald'，'Trump']
。['Donald'，'Donald']是另一个。但是，是的，这是一个很好的替代思维。我认为，把它做成一个集合，然后再去做它会起作用。像['Donald'，'Donald']这样的数据集有问题，而且它是有效的（其他一些答案不起作用）对于这个数据集the_list=['Donald Trump'，“Donald”，“Trump”，“Donald”，“Donald”，“Donald Trump”]
或者这是预期的输出吗？OP没有指定这种情况下的预期输出。我的答案保留了最长字符串的所有副本。这可以通过将项目添加到结果中后从保留中删除项目来解决；（稍微难看的）单行版本是：return[（keep.remove（item），item）[1]，如果item在keep中，那么它不会失败。['Donald Trump'，'Donald Trump']是输出