Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 查找两个子字符串之间的字符串_String_Python_Substring - Fatal编程技术网

String 查找两个子字符串之间的字符串

String 查找两个子字符串之间的字符串,string,python,substring,String,Python,Substring,如何在两个子字符串之间找到字符串('123STRINGabc'->'string') 我目前的方法是这样的: >>> start = 'asdf=5;' >>> end = '123jasd' >>> s = 'asdf=5;iwantthis123jasd' >>> print((s.split(start))[1].split(end)[0]) iwantthis 然而,这似乎效率很低,不符合python。做这样的事情

如何在两个子字符串之间找到字符串(
'123STRINGabc'->'string'

我目前的方法是这样的:

>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis
然而,这似乎效率很低,不符合python。做这样的事情,有什么更好的方法

忘了提一下:
字符串不能以
start
end
开头和结尾。它们前后可能有更多的字符。

我的方法是:

s[len(start):-len(end)]
find index of start string in s => i
find index of end string in s => j

substring = substring(i+len(start) to j-1)
给出:

123STRING
STRINGabc

我想应该注意的是,根据您需要的行为,您可以混合使用
index
rindex
调用,或者使用上述版本之一(它相当于regex
(.*)
(.*)
组)。

这里有一种方法

_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result
使用regexp的另一种方法

import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

这是我之前发布的:

#在分隔符之间拾取一段字符串
#函数使用分区,类似于分区,但删除了分隔符
def介于(左、右、s)之间:
前,a=s.partition(左)
a、 _u,after=a.分区(右)
在…之前、之后返回
s=“bla bla bla data lsdjfasdjöf(重要通知)‘Daniweb论坛’tcha tcha tchaa”
在(“”,,,s)之间打印
在(“(”,“)”,s)之间打印
在(“'”、“'”、s)之间打印
“”“输出:
('bla bla bla'、'data'、'lsdjfasdj\xc3\xb6f(重要通知)'Daniweb论坛'tcha tcha tchaa')
(“bla bla blaa data lsdjfasdj\xc3\xb6f”,“重要通知”,“Daniweb论坛”tcha tcha tchaa”)
('bla bla bla data lsdjfasdj\xc3\xb6f(重要通知)'、'Daniweb论坛'、'tcha tcha tchaa')
"""

字符串格式为Nikolaus Gradwohl的建议增加了一些灵活性<现在可以根据需要修改代码>开始和结束

import re

s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'

result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)

要提取
字符串
,请尝试:

myString = '123STRINGabc'
startString = '123'
endString = 'abc'

mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]
给予

iwantthis
必须显示: 这里0,这里1,这里2


正则表达式更好,但它需要额外的库,您可能只想使用python

只需将OP自己的解决方案转换为答案:

def find_between(s, start, end):
  return (s.split(start))[1].split(end)[0]

这基本上是cji的答案——2010年7月30日5:58。 我更改了try-except结构,以便更清楚地了解导致异常的原因

def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr  STARTING FROM THE LEFT
    http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
        above also has a func that does this FROM THE RIGHT   
'''
start, end = (-1,-1)
try:
    start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
    print '    ValueError: ',
    print "firstSubstr=%s  -  "%( firstSubstr ), 
    print sys.exc_info()[1]

try:
    end = inputStr.index( lastSubstr, start )       
except ValueError:
    print '    ValueError: ',
    print "lastSubstr=%s  -  "%( lastSubstr ), 
    print sys.exc_info()[1]

return inputStr[start:end]    

这些解决方案假定起始字符串和最终字符串不同。假设使用readlines()读取整个文件,则当初始和最终指示符相同时,我使用以下解决方案来处理整个文件:

例如:

lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
    'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
    string=extractstring(line,flag='$')
    print(string)
给出:

A NEWT?
I GOT BETTER!

您可以简单地使用此代码或复制下面的函数。全部整齐地排成一行

def substring(whole, sub1, sub2):
    return whole[whole.index(sub1) : whole.index(sub2)]
如果按如下方式运行函数

print(substring("5+(5*2)+2", "(", "("))
您将只剩下以下输出:

(5*2
而不是

5*2
如果您希望在输出的末尾有子字符串,那么代码必须如下所示

return whole[whole.index(sub1) : whole.index(sub2) + 1]
但是如果不希望子字符串在末尾,+1必须在第一个值上

return whole[whole.index(sub1) + 1 : whole.index(sub2)]
结果:

index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381

re\u-find
比本例中的
index\u-find
慢近20倍。

这对我来说似乎更直接:

import re

s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])

使用来自不同电子邮件平台的定界符解析文本会导致此问题的更大版本。它们通常有开始和停止。通配符的分隔符一直阻塞正则表达式。这里提到了拆分的问题&其他地方——哎呀,分隔符消失了。我突然想到使用replace()给split()一些其他的东西来消费。代码块:

nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)

下面是我执行的一个函数,用于返回一个在string1和string2之间搜索字符串的列表

def GetListOfSubstrings(stringSubject,string1,string2):
    MyList = []
    intstart=0
    strlength=len(stringSubject)
    continueloop = 1

    while(intstart < strlength and continueloop == 1):
        intindex1=stringSubject.find(string1,intstart)
        if(intindex1 != -1): #The substring was found, lets proceed
            intindex1 = intindex1+len(string1)
            intindex2 = stringSubject.find(string2,intindex1)
            if(intindex2 != -1):
                subsequence=stringSubject[intindex1:intindex2]
                MyList.append(subsequence)
                intstart=intindex2+len(string2)
            else:
                continueloop=0
        else:
            continueloop=0
    return MyList


#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
               print(List[x])
output:


mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
              print(List[x])
output:
    2
    2
    2
    2

mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
               print(List[x])
output:
23
23o123pp123
def GetListOfSubstrings(stringSubject、string1、string2):
MyList=[]
intstart=0
strlength=len(stringSubject)
continueloop=1
而(intstart
根据Nikolaus Gradwohl的回答,我需要从下面的文件内容(文件名:docker compose.yml)中获取('ui:'和'-')之间的版本号(即,0.0.2):

这就是我的工作原理(python脚本):


如果不想导入任何内容,请尝试字符串方法
.index()


这很好,假设开始和结束总是在字符串的开始和结束处。否则,我可能会使用正则表达式。对于我所能想到的原始问题,我给出了最具python风格的答案。在操作符中使用
进行测试可能会比regexp更快。他说他想要一种更具python风格的方法,而这显然是不太可能的。我不知道为什么选择这个答案,即使OP自己的解决方案也更好。同意。我会使用@Tim McNamara的解决方案,或者类似于
start+test+end in substring
的建议,这样就不太像pythonic了,好吧。它的效率是否也比regexp低?还有一个@Prabhu答案,你需要投反对票,因为它建议相同的解决方案。+1也适用于更通用和可重用(通过导入)的解决方案。+1因为在多次找到
end
的情况下,它比其他解决方案更有效。但我同意OP的解决方案更简单
5*2
return whole[whole.index(sub1) : whole.index(sub2) + 1]
return whole[whole.index(sub1) + 1 : whole.index(sub2)]
from timeit import timeit
from re import search, DOTALL


def partition_find(string, start, end):
    return string.partition(start)[2].rpartition(end)[0]


def re_find(string, start, end):
    # applying re.escape to start and end would be safer
    return search(start + '(.*)' + end, string, DOTALL).group(1)


def index_find(string, start, end):
    return string[string.find(start) + len(start):string.rfind(end)]


# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='

assert index_find(string, start, end) \
       == partition_find(string, start, end) \
       == re_find(string, start, end)

print('index_find', timeit(
    'index_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('partition_find', timeit(
    'partition_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('re_find', timeit(
    're_find(string, start, end)',
    globals=globals(),
    number=100_000,
))
index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381
import re

s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])
nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)
def GetListOfSubstrings(stringSubject,string1,string2):
    MyList = []
    intstart=0
    strlength=len(stringSubject)
    continueloop = 1

    while(intstart < strlength and continueloop == 1):
        intindex1=stringSubject.find(string1,intstart)
        if(intindex1 != -1): #The substring was found, lets proceed
            intindex1 = intindex1+len(string1)
            intindex2 = stringSubject.find(string2,intindex1)
            if(intindex2 != -1):
                subsequence=stringSubject[intindex1:intindex2]
                MyList.append(subsequence)
                intstart=intindex2+len(string2)
            else:
                continueloop=0
        else:
            continueloop=0
    return MyList


#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
               print(List[x])
output:


mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
              print(List[x])
output:
    2
    2
    2
    2

mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
               print(List[x])
output:
23
23o123pp123
    version: '3.1'
services:
  ui:
    image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
    #network_mode: host
    ports:
      - 443:9999
    ulimits:
      nofile:test
import re, sys

f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)


Result:
0.0.2
text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'

# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])