Python 正则表达式-将文本的子字符串与模式的子字符串匹配_Python_Regex_String Matching_Fuzzy Search

Python 正则表达式-将文本的子字符串与模式的子字符串匹配

python regex

Python 正则表达式-将文本的子字符串与模式的子字符串匹配,python,regex,string-matching,fuzzy-search,Python,Regex,String Matching,Fuzzy Search,所以我处于一种与直觉相反的情况，我想得到一些建议。我主要只是做一些字符串匹配，使用提取的字符串作为正则表达式的模式。一般来说，通过模糊正则表达式搜索，我可以做得很好，但有时我会遇到这种情况：假设我从一些数据（Python正则表达式包）中提取了以下模式现在，我需要让它匹配一个字符串，这个字符串可能看起来像这两个字符串中的任何一个，尽管大部分是第一个 string = 'quick brown fox jumps over the lazy' string2 = 'and then a qui

所以我处于一种与直觉相反的情况，我想得到一些建议。我主要只是做一些字符串匹配，使用提取的字符串作为正则表达式的模式。一般来说，通过模糊正则表达式搜索，我可以做得很好，但有时我会遇到这种情况：

假设我从一些数据（Python正则表达式包）中提取了以下模式

现在，我需要让它匹配一个字符串，这个字符串可能看起来像这两个字符串中的任何一个，尽管大部分是第一个

string = 'quick brown fox jumps over the lazy'
string2 = 'and then a quick brown fox jumps onto the cat'

由于开头和尾随字符的原因，如果我尝试执行类似于我一直在执行的操作，显然我将无法获得匹配项，目前的操作如下：

if re.search("("+pattern+"){e<=2}", string):
    print(True)

if re.search（“（“+pattern+”）{eOuf，我花了相当长的时间来完成这个（我不是python开发人员），但这应该可以做到：
import re

sentence = "the quick brown fox jumps over the lazy dog"
string = 'quick brown fox jumps over the lazy'
string2 = 'and then a quick brown fox jumps onto the cat'
count1 = 0
count2 = 0


pattern = re.sub(
    '(\w+\s*)',
    '\\1|',
    sentence
)

pattern ="(?:(?!" + pattern.rstrip("|") + ").|" + re.sub(
    '(\w+\s*)',
    '(\\1){0,1}',
    sentence
) + ")+"

results = re.match(
    pattern,
    string
)

total = len(results.groups())

for index in range(1, total):

    if results.group(index):
        count1 = count1 + 1

results = re.match(
    pattern,
    string2
)

for index in range(1, total):

    if results.group(index):
        count2 = count2 + 1

message = 'The following string:"' + string + '" matched ' + str(count1) + ' time and the following string:"' + string + '" matched ' + str(count2) + ' time.' 

此处测试：
Ouf，我花了相当长的时间来完成这个（我不是python开发人员），但这应该可以做到：
import re

sentence = "the quick brown fox jumps over the lazy dog"
string = 'quick brown fox jumps over the lazy'
string2 = 'and then a quick brown fox jumps onto the cat'
count1 = 0
count2 = 0


pattern = re.sub(
    '(\w+\s*)',
    '\\1|',
    sentence
)

pattern ="(?:(?!" + pattern.rstrip("|") + ").|" + re.sub(
    '(\w+\s*)',
    '(\\1){0,1}',
    sentence
) + ")+"

results = re.match(
    pattern,
    string
)

total = len(results.groups())

for index in range(1, total):

    if results.group(index):
        count1 = count1 + 1

results = re.match(
    pattern,
    string2
)

for index in range(1, total):

    if results.group(index):
        count2 = count2 + 1

message = 'The following string:"' + string + '" matched ' + str(count1) + ' time and the following string:"' + string + '" matched ' + str(count2) + ' time.' 

此处测试：
您是否签出了nltk
？听起来您想比较词干频率（可能是基于总体词频的权重）在字符串中输入并返回最佳匹配项。我认为nltk
支持这一点。模式的足够子字符串是什么？这是您通常必须自己计算的值，并与Levenstein距离函数一起使用。对于诸如string='quick blah brown blah fox blah blog jump之类的交错词，情况如何让我们来看看这个疯狂的词吧
？你有没有检查过nltk
？听起来你想比较词干频率（可能是基于整体词频的权重）在字符串中输入并返回最佳匹配项。我认为nltk
支持这一点。模式的足够子字符串是什么？这是您通常必须自己计算的值，并与Levenstein距离函数一起使用。对于诸如string='quick blah brown blah fox blah blog jump之类的交错词，情况如何让我们为疯狂的人放屁吧
？