在Python中查找字符串中多次出现的字符串

在Python中查找字符串中多次出现的字符串,python,string,Python,String,如何在Python中查找字符串中多次出现的字符串?考虑这一点: >>> text = "Allowed Hello Hollow" >>> text.find("ll") 1 >>> 因此,ll的第一次出现如预期的那样为1。我如何找到下一次发生的情况 同样的问题也适用于列表。考虑: >>> x = ['ll', 'ok', 'll'] 如何找到所有的ll及其索引 使用正则表达式,您可以使用查找所有(非重叠)事件: &g

如何在Python中查找字符串中多次出现的字符串?考虑这一点:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>> 
因此,
ll
的第一次出现如预期的那样为1。我如何找到下一次发生的情况

同样的问题也适用于列表。考虑:

>>> x = ['ll', 'ok', 'll']

如何找到所有的
ll
及其索引

使用正则表达式,您可以使用查找所有(非重叠)事件:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18
或者,如果您不想增加正则表达式的开销,也可以重复使用以获取下一个索引:

>>text='Allowed Hello Hollow'
>>>索引=0
>>>当索引

这也适用于列表和其他序列。

我认为您要查找的是
string.count

"Allowed Hello Hollow".count('ll')
>>> 3
希望这有帮助
注意:对于您的列表示例,这仅捕获不重叠的事件:

>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16
In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll
如果希望列表中包含“ll”的所有项目,也可以这样做

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow

对于列表示例,请使用理解:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]
与字符串类似:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]
这将列出“ll”的相邻运行,这可能是您想要的,也可能不是您想要的:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]

FWIW,这里有一些非可再生能源的替代品,我认为比它们更整洁

第一个使用
str.index
并检查
ValueError

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass
第二次测试使用
str.find
并使用
iter
检查
-1
的哨兵:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)
要将这些函数中的任何一个应用于列表、元组或其他字符串的iterable,可以使用更高级别的函数—将函数作为其参数之一—如下所示:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)

对于一般编程来说是全新的,并且通过在线教程进行工作。我也被要求这样做,但只使用到目前为止我所学的方法(基本上是字符串和循环)。不确定这是否会在这里增加任何价值,我知道这不是你将要做的,但我得到了它:

needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
   for j in range(n+1,len(haystack)+1):
      n=-1
      if needle != haystack[i:j]:
         n = n+1
         continue
      if needle == haystack[i:j]:
         counter = counter + 1
print (counter)

这个版本的字符串长度应该是线性的,只要序列不太重复就可以了(在这种情况下,可以用while循环替换递归)

bstpierre的列表理解对于短序列来说是一个很好的解决方案,但是看起来有二次复杂性,并且从未完成我使用的长文本

findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
                                   if txt.find(substr, n) == n]
对于非平凡长度的随机字符串,两个函数给出相同的结果:

import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])

>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]
但是二次型的速度要慢300倍

%timeit find_all(s, 'th')
1000 loops, best of 3: 282 µs per loop

%timeit findall_lc(s, 'th')    
10 loops, best of 3: 92.3 ms per loop

该程序统计所有子字符串的数量,即使它们在不使用正则表达式的情况下重叠。但这是一个简单的实现,为了在最坏的情况下获得更好的结果,建议遍历后缀树、KMP和其他字符串匹配数据结构和算法。

这是我查找多个匹配项的函数。U与这里的其他解决方案一样,它支持用于切片的可选开始和结束参数,就像
str.index

def all_substring_indexes(string, substring, start=0, end=None):
    result = []
    new_start = start
    while True:
        try:
            index = string.index(substring, new_start, end)
        except ValueError:
            return result
        else:
            result.append(index)
            new_start = index + len(substring)

一个简单的迭代代码,返回子字符串所在的索引列表

        def allindices(string, sub):
           l=[]
           i = string.find(sub)
           while i >= 0:
              l.append(i)
              i = string.find(sub, i + 1)
           return l

您可以拆分以获得相对位置,然后对列表中的连续数字求和,同时添加(字符串长度*出现顺序)以获得所需的字符串索引

>>> key = 'll'
>>> text = "Allowed Hello Hollow"
>>> x = [len(i) for i in text.split(key)[:-1]]
>>> [sum(x[:i+1]) + i*len(key) for i in range(len(x))]
[1, 10, 16]
>>> 

也许不太像python,但有点不言自明。它返回单词在原始字符串中的位置

def retrieve_occurences(sequence, word, result, base_counter):
     indx = sequence.find(word)
     if indx == -1:
         return result
     result.append(indx + base_counter)
     base_counter += indx + len(word)
     return retrieve_occurences(sequence[indx + len(word):], word, result, base_counter)

我认为没有必要测试文本的长度;只要继续查找,直到没有任何内容可供查找。如下所示:

    >>> text = 'Allowed Hello Hollow'
    >>> place = 0
    >>> while text.find('ll', place) != -1:
            print('ll found at', text.find('ll', place))
            place = text.find('ll', place) + 2


    ll found at 1
    ll found at 10
    ll found at 16

您还可以使用条件列表理解来执行此操作,如下所示:

string1= "Allowed Hello Hollow"
string2= "ll"
print [num for num in xrange(len(string1)-len(string2)+1) if string1[num:num+len(string2)]==string2]
# [1, 10, 16]

我刚才随机得到这个想法。使用带字符串拼接和字符串搜索的while循环可以工作,即使对于重叠的字符串也是如此

findin = "algorithm alma mater alison alternation alpines"
search = "al"
inx = 0
num_str = 0

while True:
    inx = findin.find(search)
    if inx == -1: #breaks before adding 1 to number of string
        break
    inx = inx + 1
    findin = findin[inx:] #to splice the 'unsearched' part of the string
    num_str = num_str + 1 #counts no. of string

if num_str != 0:
    print("There are ",num_str," ",search," in your string.")
else:
    print("There are no ",search," in your string.")
我是Python编程(实际上是任何语言的编程)的业余爱好者,不确定它还会有什么其他问题,但我想它工作得很好吧


如果需要的话,我想lower()也可以在其中的某个地方使用。

下面的函数会查找另一个字符串中出现的所有字符串,同时通知找到每个字符串的位置

您可以使用下表中的测试用例调用该函数。您可以尝试将单词、空格和数字混合在一起

该函数适用于重叠字符

|         theString          | aString |
| -------------------------- | ------- |
| "661444444423666455678966" |  "55"   |
| "661444444423666455678966" |  "44"   |
| "6123666455678966"         |  "666"  |
| "66123666455678966"        |  "66"   |

Calling examples:
1. print("Number of occurrences: ", find_all("123666455556785555966", "5555"))
   
   output:
           Found in position:  7
           Found in position:  14
           Number of occurrences:  2
   
2. print("Number of occorrences: ", find_all("Allowed Hello Hollow", "ll "))

   output:
          Found in position:  1
          Found in position:  10
          Found in position:  16
          Number of occurrences:  3

3. print("Number of occorrences: ", find_all("Aaa bbbcd$#@@abWebbrbbbbrr 123", "bbb"))

   output:
         Found in position:  4
         Found in position:  21
         Number of occurrences:  2
         

def find_all(theString, aString):
    count = 0
    i = len(aString)
    x = 0

    while x < len(theString) - (i-1): 
        if theString[x:x+i] == aString:        
            print("Found in position: ", x)
            x=x+i
            count=count+1
        else:
            x=x+1
    return count
|字符串| aString|
| -------------------------- | ------- |
| "661444444423666455678966" |  "55"   |
| "661444444423666455678966" |  "44"   |
| "6123666455678966"         |  "666"  |
| "66123666455678966"        |  "66"   |
调用示例:
1.打印(“出现次数:”,查找全部(“123666455556785555966”,“5555”))
输出:
找到位置:7
找到位置:14
发生次数:2
2.打印(“occorrence的数量:”,查找所有(“允许的Hello Hollow”,“ll”))
输出:
找到位置:1
找到位置:10
位置:16
发生次数:3
3.打印(“OCcorrence数:”,查找所有(“Aaa bbbcd$”@@ABWebBBBRR 123,“bbb”))
输出:
找到位置:4
位置:21
发生次数:2
def find_all(字符串、字符串):
计数=0
i=长度(收敛)
x=0
当x
此代码可能不是最短/最有效的,但它简单易懂

def findall(f, s):
    l = []
    i = -1
    while True:
        i = s.find(f, i+1)
        if i == -1:
            return l
        l.append(s.find(f, i))

findall('test', 'test test test test')
# [0, 5, 10, 15]

对于第一个版本,请检查字符串:

def findall(文本,子项):
“”“返回文本中出现子字符串的所有索引”“”
返回[
指数
用于索引
findin = "algorithm alma mater alison alternation alpines"
search = "al"
inx = 0
num_str = 0

while True:
    inx = findin.find(search)
    if inx == -1: #breaks before adding 1 to number of string
        break
    inx = inx + 1
    findin = findin[inx:] #to splice the 'unsearched' part of the string
    num_str = num_str + 1 #counts no. of string

if num_str != 0:
    print("There are ",num_str," ",search," in your string.")
else:
    print("There are no ",search," in your string.")
|         theString          | aString |
| -------------------------- | ------- |
| "661444444423666455678966" |  "55"   |
| "661444444423666455678966" |  "44"   |
| "6123666455678966"         |  "666"  |
| "66123666455678966"        |  "66"   |

Calling examples:
1. print("Number of occurrences: ", find_all("123666455556785555966", "5555"))
   
   output:
           Found in position:  7
           Found in position:  14
           Number of occurrences:  2
   
2. print("Number of occorrences: ", find_all("Allowed Hello Hollow", "ll "))

   output:
          Found in position:  1
          Found in position:  10
          Found in position:  16
          Number of occurrences:  3

3. print("Number of occorrences: ", find_all("Aaa bbbcd$#@@abWebbrbbbbrr 123", "bbb"))

   output:
         Found in position:  4
         Found in position:  21
         Number of occurrences:  2
         

def find_all(theString, aString):
    count = 0
    i = len(aString)
    x = 0

    while x < len(theString) - (i-1): 
        if theString[x:x+i] == aString:        
            print("Found in position: ", x)
            x=x+i
            count=count+1
        else:
            x=x+1
    return count
def findall(f, s):
    l = []
    i = -1
    while True:
        i = s.find(f, i+1)
        if i == -1:
            return l
        l.append(s.find(f, i))

findall('test', 'test test test test')
# [0, 5, 10, 15]