Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中查找两个字符串的所有公共连续子字符串_Python_Regex_String - Fatal编程技术网

在python中查找两个字符串的所有公共连续子字符串

在python中查找两个字符串的所有公共连续子字符串,python,regex,string,Python,Regex,String,我有两个字符串,我想找到所有常用词。比如说, s1 = 'Today is a good day, it is a good idea to have a walk.' s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?' 考虑s1与s2的匹配 '今天是'matches'今天是'but'Today is a'与s2中的任何字符都不匹配。因此,“今天是”是常见的连续字符之一。同样,我们也有“好

我有两个字符串,我想找到所有常用词。比如说,

s1 = 'Today is a good day, it is a good idea to have a walk.'

s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
考虑s1与s2的匹配

'今天是'matches'今天是'but'Today is a'与s2中的任何字符都不匹配。因此,“今天是”是常见的连续字符之一。同样,我们也有“好日子”、“好日子”、“散步”。所以常用的词是

common = ['today is', 'a good day', 'is', 'a good', 'have a walk']
我们可以用正则表达式来做吗

非常感谢。

修改了几行并添加了几行 如果未找到任何子字符串,则修改是默认返回answer=“NULL”

增加 继续搜索,直到得到NULL并存储到列表

def longestSubstringFinder(string1, string2):
    answer = "NULL"
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer


mylist = []

def call():
    s1 = 'Today is a good day, it is a good idea to have a walk.'

    s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
    s1 =  s1.lower()
    s2 = s2.lower()
    x = longestSubstringFinder(s2,s1)
    while(longestSubstringFinder(s2,s1) != "NULL"): 
        x = longestSubstringFinder(s2,s1)
        print(x)
        mylist.append(x)
        s2 = s2.replace(x,' ')

call()
print ('[%s]' % ','.join(map(str, mylist)))
输出中的差异

common = ['today is', 'a good day', 'is', 'a good', 'have a walk']
您对第二个的期望是“错误的,正如您在s2中看到的,只有一个“is”

导入字符串
s1=“今天是个好天气,散步是个好主意。”
s2=“昨天天气不好,但今天很好,我们去散散步好吗?”
z=[]
s1=s1.翻译(无、字符串、标点符号)#删除标点符号
s2=s2.translate(无、字符串、标点符号)
打印s1
打印s2
sw1=s1.lower().split()#将其拆分为单词
sw2=s2.lower().split()
打印sw1,sw2
i=0
i0时:#如果不相同,则检查缓冲区中是否已有一个缓冲区,并将其添加到结果中(此处为z)
z、 附加(r)
i=d
r=“”
x=0
如果x>0:#上述循环的结束情况
z、 附加(r)
r=“”
i=d
x=0
i+=1
#打印i
打印列表(套(z))
#O(n^3)

您是在寻找常用词还是常用短语?您是否试图避免重复计算匹配项,因为诸如“好的一天”之类的短语可能会被分解为“好的一天”,然后再次进行评估。您的标准需要严格:例如,s1中的今天和s2中的昨天有共同的天谢谢您,Hariom Singh,你是对的。程序对所提到的输入不起作用:s1='今天是个好日子,散步是个好主意',s2='昨天不是个好日子,但今天是个好日子,我们去散步好吗?'@Poonam它工作得很好。你执行call()函数了吗?@Poonam@-是的,我想不允许重复同样的情况。比如一旦我把“今天是个好日子”作为最长的字符串,“好日子”就不应该重复了。但根据问题,你们的逻辑运作良好。
common = ['today is', 'a good day', 'is', 'a good', 'have a walk']
import string
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
z=[]
s1=s1.translate(None, string.punctuation) #remove punctuation
s2=s2.translate(None, string.punctuation)
print s1
print s2
sw1=s1.lower().split()                   #split it into words
sw2=s2.lower().split()
print sw1,sw2
i=0
while i<len(sw1):          #two loops to detect common strings. used while so as to change value of i in the loop itself
    x=0
    r=""
    d=i
    #print r
    for j in range(len(sw2)):
        #print r
        if sw1[i]==sw2[j]:
            r=r+' '+sw2[j]                       #if string same keep adding to a variable
            x+=1
            i+=1
        else:
            if x>0:     # if not same check if there is already one in buffer and add it to result (here z)
                z.append(r)
                i=d
                r=""
                x=0
    if x>0:                                            #end case of above loop
        z.append(r)
        r=""
        i=d
        x=0
    i+=1 
    #print i
print list(set(z)) 

#O(n^3)