Python 字符串列表中的公共子字符串_Python_Python 3.x_Algorithm_Substring_Python 3.8

Python 字符串列表中的公共子字符串

python python-3.x algorithm

Python 字符串列表中的公共子字符串,python,python-3.x,algorithm,substring,python-3.8,Python,Python 3.x,Algorithm,Substring,Python 3.8,我在试图解决一个问题时遇到了一个问题，给定一些字符串及其长度，您需要找到它们的公共子字符串。我的代码是这样的，它在列表中循环，然后在列表中的每个单词中循环： num_of_cases = int(input()) for i in range(1, num_of_cases+1): if __name__ == '__main__': len_of_str = list(map(int, input().split())) len_of_virus = i

我在试图解决一个问题时遇到了一个问题，给定一些字符串及其长度，您需要找到它们的公共子字符串。我的代码是这样的，它在列表中循环，然后在列表中的每个单词中循环：

num_of_cases = int(input())
for i in range(1, num_of_cases+1):
    if __name__ == '__main__':
        len_of_str = list(map(int, input().split()))
        len_of_virus = int(input())

    strings = []
    def string(strings, len_of_str):
        len_of_list = len(len_of_str)
        for i in range(1, len_of_list+1):
            strings.append(input())
    
    lst_of_subs = []
    virus_index = []
    def substr(strings, len_of_virus):
        for word in strings:
             for i in range(len(len_of_str)):
                  leng = word[i:len_of_virus]
                  lst_of_subs.append(leng)
                  virus_index.append(i)

    print(string(strings, len_of_str))
    print(substr(strings, len_of_virus))

它在给定字符串的情况下打印以下内容：ananasso、associazione、tassonomia、massone

['anan', 'nan', 'an', 'n', 'asso', 'sso', 'so', 'o', 'tass', 'ass', 'ss', 's', 'mass', 'ass', 'ss', 's']

似乎结束索引没有增加，尽管我在循环的末尾写了

len\u of_virus+=1

样本输入：

1
8 12 10 7
4
ananasso
associazione
tassonomia
massone

其中第一个字母是病例数，第二行是字符串的名称，第三行是病毒的长度（公共子字符串），然后是我应该循环使用的给定字符串

预期产出：

Case #1: 4 0 1 1

其中，四个数字是公共子字符串的起始索引。（我认为打印代码并不关心这个特定问题）

我该怎么办？请帮忙

除了在奇怪的地方定义函数并使用所述函数以不鼓励的方式获得副作用外，问题在于：

for i in range(len(len_of_str)):
    leng = word[i:len_of_virus]

在每次迭代中不断增加，但病毒的len_保持不变，因此您可以有效地执行此操作

word[0:4] #when len_of_virus=4
word[1:4]
word[2:4]
word[3:4]
...

这就是

'anan'，nan'，'an'，'n'，

来自第一个单词“ananasso”，另一个单词也是如此

>>> word="ananasso"
>>> len_of_virus = 4
>>> for i in range(len(word)):
        word[i:len_of_virus]

    
'anan'
'nan'
'an'
'n'
''
''
''
''
>>>

你可以通过i移动上端来修复它，但在另一端会留下同样的问题

>>> for i in range(len(word)):
    word[i:len_of_virus+i]

    
'anan'
'nana'
'anas'
'nass'
'asso'
'sso'
'so'
'o'
>>>

因此，对范围进行一些简单的调整并解决问题：

>>> for i in range(len(word)-len_of_virus+1):
        word[i:len_of_virus+i]

    
'anan'
'nana'
'anas'
'nass'
'asso'
>>>

现在子字符串部分已经完成，其余部分也很简单

>>> def substring(text,size):
        return [text[i:i+size] for i in range(len(text)-size+1)]

>>> def find_common(lst_text,size):
        subs = [set(substring(x,size)) for x in lst_text]
        return set.intersection(*subs)

>>> test="""ananasso
associazione
tassonomia
massone""".split()
>>> find_common(test,4)
{'asso'}
>>>

为了找到列表中所有字符串的公共部分，我们可以使用a，首先我们将给定单词的所有子字符串放入一个集合，最后我们将它们全部相交

剩下的只是根据你的喜好打印出来

>>> virus = find_common(test,4).pop()
>>> print("case 1:",*[x.index(virus) for x in test])
case 1: 4 0 1 1
>>>

首先从最短的字符串中提取给定大小的所有子字符串。然后选择所有字符串中存在的第一个子字符串。最后输出此公共子字符串在每个字符串中的位置：

def commonSubs(strings,size):
    base = min(strings,key=len) # shortest string
    subs = [base[i:i+size] for i in range(len(base)-size+1)] # all substrings
    cs = next(ss for ss in subs if all(ss in s for s in strings)) # first common
    return [s.index(cs) for s in strings] # indexes of common substring

输出：

S = ["ananasso", "associazione", "tassonomia", "massone"]
print(commonSubs(S,4))
[4, 0, 1, 1]

您还可以使用递归方法：

def commonSubs(strings,size,i=0):
    sub = strings[0][i:i+size]
    if all(sub in s for s in strings):
        return [s.index(sub) for s in strings]
    return commonSubs(strings,size,i+1)

我决定不放函数头，这会影响输出吗？输入为：

81210774 ananasso associazione tassonomia massone

，它应该打印出公共子字符串，即“asso”，以及列表中每个字符串的起始索引（前4个数字是列表中的字符串，第5个数字是病毒的长度，也称为公共子字符串，这四个字符串是我必须循环使用的字符串）我编辑了帖子并添加了func headers病毒出现在问题陈述中，表示有病毒（子字符串）在每个字符串中，每个字符串的长度都是一样的。我的任务是找到病毒（每个字符串中的公共子字符串，并在每个字符串中打印出它的起始索引）好的，谢谢。我找到了（第5页）是的，我想是的，虽然我找到的链接基本上是我应该用函数（子字符串）替换for循环吗？而且，它会打印错误，交叉点需要一个参数。基本上是的，这里的想法是使用我在这里介绍的内容，你使用这些想法并修改它来修复你的代码…我不知道你为什么会出现错误，因为我不知道你在endok中做了什么，谢谢，我终于弄明白了：-）