Python 减少基于元素子字符串的列表_Python

Python 减少基于元素子字符串的列表

python

Python 减少基于元素子字符串的列表,python,Python,我正在寻找一种最有效的方法，根据列表中已有的子字符串来减少给定的列表比如说 mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu'] 将减少到： mylist = ['abcd','qrs'] 因为“abcd”和“qrs”都是该列表中其他元素的最小子字符串。我可以用大约30行代码来实现这一点，但我怀疑有一个巧妙的单行程序。一个解决方案是迭代所有字符串，并根据它们是否具有不同的字符来拆分它们，然后递归地应用该函数 def reduce_s

我正在寻找一种最有效的方法，根据列表中已有的子字符串来减少给定的列表

比如说

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

将减少到：

mylist = ['abcd','qrs']

因为“abcd”和“qrs”都是该列表中其他元素的最小子字符串。我可以用大约30行代码来实现这一点，但我怀疑有一个巧妙的单行程序。

一个解决方案是迭代所有字符串，并根据它们是否具有不同的字符来拆分它们，然后递归地应用该函数

def reduce_substrings(strings):
    return list(_reduce_substrings(map(iter, strings)))

def _reduce_substrings(strings):
    # A dictionary of characters to a list of strings that begin with that character
    nexts = {}
    for string in strings:
        try:
            nexts.setdefault(next(string), []).append(string)
        except StopIteration:
            # Reached the end of this string. It is the only shortest substring.
            yield ''
            return
    for next_char, next_strings in nexts.items():
        for next_substrings in _reduce_substrings(next_strings):
            yield next_char + next_substrings

这将根据字符将其拆分为字典，并尝试从拆分为字典中不同列表的子字符串中找到最短的子字符串

当然，由于该函数的递归性质，单行程序的效率不高。

试试这个：

import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
    if re.match("^abcd$",i):
        new_list.append(i)
    elif re.match("^qrs$",i):
        new_list.append(i)
print(new_list)
#['abcd', 'qrs']

这似乎起作用了（但我想效率不高）

测试：

>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu',
                    'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']

可能不是最有效的，但至少是最短的：

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)

在较高的层次上，它很简单：构建一个节点，然后获取根的直接子节点（表示实际元素；节点只是其desendents的最大公共前缀）。实际上，您需要找到一个合适的基数树实现。可能有助于您开始。您能为测试提供更复杂的示例吗？子字符串总是应该是前缀吗？是的，它们总是应该是前缀这假设列表的值是已知的。值将是未知的，并且值在列表中不能有其他项，这些项是我获得的该项的子字符串。谢谢。对字符串进行预排序是一个聪明的技巧，与我的简单解决方案相比，它可能会大大加快排序速度。

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)