Python 删除多行字符串中直到给定模式的所有字符_Python_Regex_String

Python 删除多行字符串中直到给定模式的所有字符

python regex string

Python 删除多行字符串中直到给定模式的所有字符,python,regex,string,Python,Regex,String,使用Python，我需要删除多行字符串中的所有字符，直到给定模式第一次出现为止。在Perl中，这可以使用正则表达式来完成，如下所示： #remove all chars up to first occurrence of cat or dog or rat $pattern = 'cat|dog|rat' $pagetext =~ s/(.*?)($pattern)/$2/xms; 在Python中，最好的方法是什么 >>> import re >>>

使用Python，我需要删除多行字符串中的所有字符，直到给定模式第一次出现为止。在Perl中，这可以使用正则表达式来完成，如下所示：

#remove all chars up to first occurrence of cat or dog or rat
$pattern = 'cat|dog|rat' 
$pagetext =~ s/(.*?)($pattern)/$2/xms;

在Python中，最好的方法是什么

>>> import re
>>> s = 'hello cat!'
>>> m = re.search('cat|dog|rat', s)
>>> s[m.start():]
'cat!'

当然，您需要考虑实际解决方案中没有匹配项的情况

或者更清楚地说：

>>> import re
>>> s = 'hello cat!'
>>> p = 'cat|dog|rat'
>>> re.sub('.*?(?=%s)' % p, '', s, 1)
'cat!'

对于多行，请使用

re.DOTALL

标志。

类似的操作可以满足您的需要：

import re
text = '   sdfda  faf foo zing baz bar'
match = re.search('foo|bar', text)
if match:
  print text[match.start():] # ==>  'foo zing baz bar'

非正则表达式方式

>>> s='hello cat!'
>>> pat=['cat','dog','rat']
>>> for n,i in enumerate(pat):
...     m=s.find(i)
...     if m != -1: print s[m:]
...
cat!

您希望删除第一次出现模式之前的所有字符；举个例子，你给“猫、狗、老鼠”

使用re实现此目的的代码：

re.sub("(?s).*?(cat|dog|rat)", "\\1", input_text, 1)

或者，如果要再次使用此正则表达式：

rex= re.compile("(?s).*?(cat|dog|rat)")
result= rex.sub("\\1", input_text, 1)

注意非贪婪的

*？

。首字母

（？s）

也允许在匹配单词之前匹配换行符

示例：

>>> input_text= "I have a dog and a cat"
>>> re.sub(".*?(cat|dog|rat)", "\\1", input_text, 1)
'dog and a cat'

>>> re.sub("(?s).*?(cat|dog|rat)", "\\1", input_text, 1)
'I have no animals!'

>>> input_text= "This is irrational"
>>> re.sub("(?s).*?(cat|dog|rat)", "\\1", input_text, 1)
'rational'

如果您只想对单词cat、dog和rat进行转换，则必须将正则表达式更改为：

>>> re.sub(r"(?s).*?\b(cat|dog|rat)\b", "\\1", input_text, 1)
'This is irrational'

另一个选项是使用前瞻

s/*？（？=$pattern）//xs

：

re.sub(r'(?s).*?(?=cat|dog|rat)', '', text, 1)

非正则表达式方式：

for option in 'cat dog rat'.split():
    index = text.find(option)
    if index != -1: # found
       text = text[index:]
       break

非正则表达式方式的速度几乎快了5倍（对于某些输入）：

其中

drop\u直到\u word.py

为：

import re

def drop_re(text, options):
    return re.sub(r'(?s).*?(?='+'|'.join(map(re.escape, options))+')', '',
                  text, 1)

def drop_re2(text, options):
    return re.sub(r'(?s).*?('+'|'.join(map(re.escape, options))+')', '\\1',
                  text, 1)

def drop_search(text, options):
    m = re.search('|'.join(map(re.escape, options)), text)
    return text[m.start():] if m else text

def drop_find(text, options):
    indexes = [i for i in (text.find(option) for option in options) if i != -1]
    return text[min(indexes):] if indexes else text

text = open('/usr/share/dict/words').read()
options = 'cat dog rat'.split()

def test():
    assert drop_find(text, options) == drop_re(text, options) \
        == drop_re2(text, options) == drop_search(text, options)

    txt = 'dog before cat'
    r = txt
    for f in [drop_find, drop_re, drop_re2, drop_search]:
        assert r == f(txt, options), f.__name__


if __name__=="__main__":
    test()

+1:注意到我没有注意到的不灵活和匹配限制。你对“和r”的使用不一致（例如，它可能是

r“\1”

）@Ian Bicking:不一致是旁观者的眼睛。我几乎总是对具有多个文字反斜杠的字符串使用r“”表示法；例外情况是包含{}个字符名的unicode正则表达式。或具有。跨多行匹配使用DOTALL re.sub（r''，r''，st，flags=re.DOTALL）或在正则表达式前面加上

（？s）

；此外，它将删除直到上次出现$pattern为止的所有字符

在Perl中是贪婪的，就像在Python中一样。谢谢。已更新，因此没有贪婪匹配。

enumerate（）

在这里是不必要的

re.search

比

re.sub

变体在这种情况下要快得多，谢谢。无重新版本并没有真正做到我所追求的，虽然也就是说，删除所有字符，直到第一次出现猫、狗或老鼠。例如，如果字符串是“猫之前的狗”，则重新版本将正确返回“猫之前的狗”，而查找版本将只返回“猫”。@biffabacon:good catch。我已经为“先狗后猫”的案例修复了

drop\u find（）

。

import re

def drop_re(text, options):
    return re.sub(r'(?s).*?(?='+'|'.join(map(re.escape, options))+')', '',
                  text, 1)

def drop_re2(text, options):
    return re.sub(r'(?s).*?('+'|'.join(map(re.escape, options))+')', '\\1',
                  text, 1)

def drop_search(text, options):
    m = re.search('|'.join(map(re.escape, options)), text)
    return text[m.start():] if m else text

def drop_find(text, options):
    indexes = [i for i in (text.find(option) for option in options) if i != -1]
    return text[min(indexes):] if indexes else text

text = open('/usr/share/dict/words').read()
options = 'cat dog rat'.split()

def test():
    assert drop_find(text, options) == drop_re(text, options) \
        == drop_re2(text, options) == drop_search(text, options)

    txt = 'dog before cat'
    r = txt
    for f in [drop_find, drop_re, drop_re2, drop_search]:
        assert r == f(txt, options), f.__name__


if __name__=="__main__":
    test()