Python 如何检查具有自定义公差级别的字符串中是否出现了类似的子字符串
如何检查子字符串是否位于具有特定编辑距离公差的字符串内。例如:Python 如何检查具有自定义公差级别的字符串中是否出现了类似的子字符串,python,edit-distance,Python,Edit Distance,如何检查子字符串是否位于具有特定编辑距离公差的字符串内。例如: str = 'Python is a multi-paradigm, dynamically typed, multipurpose programming language, designed to be quick (to learn, to use, and to understand), and to enforce a clean and uniform syntax.' substr1 = 'ython' substr2
str = 'Python is a multi-paradigm, dynamically typed, multipurpose programming language, designed to be quick (to learn, to use, and to understand), and to enforce a clean and uniform syntax.'
substr1 = 'ython'
substr2 = 'thon'
substr3 = 'cython'
edit_distance_tolerance = 1
substr_in_str(str, substr1, edit_distance_tolerance)
>> True
substr_in_str(str, substr2, edit_distance_tolerance)
>> False
substr_in_str(str, substr3, edit_distance_tolerance)
>> True
我尝试的是:
我试着用单词来打断字符串,删除特殊字符,然后逐个进行比较,但性能(在速度和准确性方面)不是很好。答案并不像你想的那么简单,你需要大量的数学来实现这一点,而标准的re(regex)库无法解决这个问题。我认为TRE库已经在很大程度上解决了这个问题,请看这里,答案并不像你想的那么简单,你需要大量的数学来实现这一点,而标准re(regex)库无法解决这个问题。我认为TRE库已经在很大程度上解决了这个问题,请看这里,这是我提出的递归解决方案,希望它是正确的:
def substr_in_str_word(string, substr, edit_distance_tolerance):
if edit_distance_tolerance<0:
return False
if len(substr) == 0:
return True
if len(string) == 0:
return False
for s1 in string:
for s2 in substr:
if s1==s2:
return substr_in_str(string[1:],substr[1:], edit_distance_tolerance)
else:
return substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or \
substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or\
substr_in_str(string[1:],substr, edit_distance_tolerance-1) or \
substr_in_str(string,substr[1:], edit_distance_tolerance-1)
def substr_in_str(string, substr, edit_distance_tolerance):
for word in string.split(' '):
if substr_in_str_word(word, substr, edit_distance_tolerance):
return True
return False
输出:
True
False
True
下面是我提出的递归解决方案,希望是正确的:
def substr_in_str_word(string, substr, edit_distance_tolerance):
if edit_distance_tolerance<0:
return False
if len(substr) == 0:
return True
if len(string) == 0:
return False
for s1 in string:
for s2 in substr:
if s1==s2:
return substr_in_str(string[1:],substr[1:], edit_distance_tolerance)
else:
return substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or \
substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or\
substr_in_str(string[1:],substr, edit_distance_tolerance-1) or \
substr_in_str(string,substr[1:], edit_distance_tolerance-1)
def substr_in_str(string, substr, edit_distance_tolerance):
for word in string.split(' '):
if substr_in_str_word(word, substr, edit_distance_tolerance):
return True
return False
输出:
True
False
True