Java 查找字符串中已分割子字符串的所有匹配项_Java_Python_Regex_Algorithm

Java 查找字符串中已分割子字符串的所有匹配项

java python regex algorithm

Java 查找字符串中已分割子字符串的所有匹配项,java,python,regex,algorithm,Java,Python,Regex,Algorithm,我想解决一个非常小的问题。如果子字符串不一定是一个整体，我需要找到字符串中所有子字符串出现的次数例子：输入：我将尝试查找子字符串和事件： adnndaend adnndaend adnndaend adnndaend aDNDAEnd adnndaend 输出：我试图通过使用pythonre.findall实现事件列表： re.findall('^.*a.*n.*d.*$', 'adnndaend') 但它返回的列表中只包含一项—整个字符串： ['adnndaend'] 那么，

我想解决一个非常小的问题。如果子字符串不一定是一个整体，我需要找到字符串中所有子字符串出现的次数

例子：输入：我将尝试查找子字符串和
事件： adnndaend
adnndaend
adnndaend
adnndaend
aDNDAEnd
adnndaend
输出：
我试图通过使用pythonre.findall实现事件列表：

re.findall('^.*a.*n.*d.*$', 'adnndaend')
但它返回的列表中只包含一项—整个字符串：

['adnndaend']

那么，你能告诉我，我的正则表达式有什么问题，或者给我看看你更好的解决方案吗？理想情况下，在Python或Java中，我对其他语言不是很熟悉。
Regex返回非重叠匹配，在您的例子中，这只是一个单一的匹配。所以正则表达式是不可能的。相反，我提出了一个小小的递归函数：

def count(haystack, needle): result= 0 pos= -1 char= needle[0] # we'll be searching the haystack for all occurences of this character. while True: # find the next occurence pos= haystack.find(char, pos+1) # if there are no more occurences, we're done if pos==-1: return result # once we found the first character, recursively count the occurences of # needle (without the first character) in what's left of haystack if len(needle)==1: result+= 1 else: result+= count(haystack[pos+1:], needle[1:])
我没有进行广泛的测试，但是：

>>> print count('adnndaend', 'and') 6

public int findOccurrences（字符串str，字符串键）{ int-total=0；对于（int i=0；i1）{ 总计+=findOccurrences（str.substring（i），key.substring（1））； }否则{ 总数+=1； } } } 返回总数； } @试验是的{ System.out.println（findOccurrences（“adnndaend”、“and”）； }
输出=6，您可以按如下方式使用：

import itertools pattern = "and" print len([''.join(i) for i in itertools.combinations('adnndaend',len(pattern) if ''.join(i) == pattern])
输出：

6

其思想是使用
itertools.compositions
生成字符序列的所有组合，并根据您的模式进行匹配；结果列表将只有匹配的项。
您可以使用a、n和d出现的次数获得所有组合：

from itertools import combinations def sub_s(st,word): all_s = (x for x in st if x in word) return len([x for x in (combinations(all_s, len(word))) if "".join(x) == word] )

re.findall
仅返回无重叠匹配项。例如，您的前两次出现不会都返回，因为在这两次出现中都找到了相同的
a
和
d
。这对正则表达式不起作用，因为正则表达式要么给您一个懒惰（即.*）或贪婪（即.*）响应，除非您特别要求，否则在这两次出现之间不会有任何东西（例如.{3}），这意味着您必须尝试同一正则表达式的多个变体，这将非常低效。请使用
len（pattern）
而不是
3
。感谢您的回答，我只有一个问题-第二个“join”会导致“无效语法”错误。这只是我的问题，还是代码中的错误？我必须承认，我没有看到任何错误：我已经在我的机器中验证了代码..很好..你复制时应该有问题。。。
import itertools pattern = "and" print len([''.join(i) for i in itertools.combinations('adnndaend',len(pattern) if ''.join(i) == pattern])

6

from itertools import combinations def sub_s(st,word): all_s = (x for x in st if x in word) return len([x for x in (combinations(all_s, len(word))) if "".join(x) == word] )