Python 如果有正的前向和正的后向,但没有分隔符,如何拆分字符串?
例如:Python 如果有正的前向和正的后向,但没有分隔符,如何拆分字符串?,python,regex,python-3.x,split,Python,Regex,Python 3.x,Split,例如: s = "Thisissometext andthisissometext" 我想将文本分为“是”和“一些”: 如果我这样做: re.split("(?<=is)s(?=ome)", s) --> ['Thisis', 'ometext andthisis', 'ometext'] re.split((?您需要支持空拆分的较新版本: import regex as re s = "Thisissometext andthisissometext" print(r
s = "Thisissometext andthisissometext"
我想将文本分为“是”和“一些”:
如果我这样做:
re.split("(?<=is)s(?=ome)", s)
--> ['Thisis', 'ometext andthisis', 'ometext']
re.split((?您需要支持空拆分的较新版本:
import regex as re
s = "Thisissometext andthisissometext"
print(re.split(r"(?V1)(?<=is)(?=some)", s))
# ['Thisis', 'sometext andthisis', 'sometext']
这里不是使用split
,而是一个正则表达式,您可以在re.findall
中使用它来完成工作:
>>> s = "Thisissometext andthisissometext"
>>> print re.findall(r'[\w\s]+?(?:is(?=some)|$)', s)
['Thisis', 'sometext andthisis', 'sometext']
正则表达式分解:
[\w\s]+?
:匹配1+个单词或空格字符(非贪婪的)
(?:
:启动非捕获组
is
:匹配文字is
(?=some)
:后面必须跟着some
|
:或
$
:它是字符串的结尾
)
:结束非捕获组
一种简单快捷的方法,如果您知道文本中不存在字符,@'
这里:
s.replace('issome','is@some').split('@')
# ['Thisis', 'sometext andthisis', 'sometext']
测试:
In [300]: %timeit s.replace('issome','is@some').split('@')
976 ns ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [301]: %timeit regex.split(r"(?V1)(?<=is)(?=some)", s)
7.36 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [302]: %timeit re.findall(r'[\w\s]+?(?:is(?=some)|$)', s)
4.28 µs ± 97.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
[300]中的:%timeit s.replace('issome','is@some“)。拆分(“@”)
每个回路976纳秒±21.6纳秒(7次运行的平均值±标准偏差,每个1000000个回路)
[301]:%timeit regex.split(r)(?V1)(?Hoi-Jan,很好的解决方案!从没听说过“(?V1)”哇。@Reman:很高兴能帮忙。在答案的底部提供了另一种选择。谢谢你的解决方案。非常好,但有时我需要regex来分割我的字符串。另外还有一个用于timit!
print(re.split(r"(?<=is)(?=some)", s, flags = re.VERSION1))
>>> s = "Thisissometext andthisissometext"
>>> print re.findall(r'[\w\s]+?(?:is(?=some)|$)', s)
['Thisis', 'sometext andthisis', 'sometext']
s.replace('issome','is@some').split('@')
# ['Thisis', 'sometext andthisis', 'sometext']
In [300]: %timeit s.replace('issome','is@some').split('@')
976 ns ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [301]: %timeit regex.split(r"(?V1)(?<=is)(?=some)", s)
7.36 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [302]: %timeit re.findall(r'[\w\s]+?(?:is(?=some)|$)', s)
4.28 µs ± 97.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)