Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python正则表达式与模式中的可选单词拆分_Python_Regex - Fatal编程技术网

Python正则表达式与模式中的可选单词拆分

Python正则表达式与模式中的可选单词拆分,python,regex,Python,Regex,我试图将一个字符串拆分为一个特定的短语,该短语可能包含也可能不包含特定的单词。我正在努力找到正确的语法 以下是代码的当前版本: import re from pprint import pprint text = """Here is a list: Bob talked to Caleb, and Caleb talked to Derek, and Derek talked to Eric, and Eric talked to Fred, and Fred talked to Greg,

我试图将一个字符串拆分为一个特定的短语,该短语可能包含也可能不包含特定的单词。我正在努力找到正确的语法

以下是代码的当前版本:

import re
from pprint import pprint

text = """Here is a list: Bob talked to Caleb, and Caleb talked to Derek, and Derek talked to Eric, and Eric talked to Fred, and Fred talked to Greg, and Greg talked to Henry, and Henry talked to Isaac, and Isaac talked to Jesse, and Jesse talked to Ken."""

pprint(re.split(r"(a?n?d? ?\w+ talked to)",text))
在本例中,我想在“Bob talked to”或“and Caleb talked to”上进行拆分,因此如果存在,则应包括and,如果不存在,则应包括and

此代码生成(几乎正确):

唯一的轻微错误是“Bob”前面有一个空格,因为正则表达式中有一个“?”而被捕获。所以我不希望每个字母都是“a?n?d??”。我宁愿要“(和)?”

不幸的是,结果如下:

print(re.split(r"((and )?\w+ talked to)",text))
给我:

['Here is a list: ',
 'Bob talked to',
 None,
 ' Caleb, ',
 'and Caleb talked to',
 'and ',
 ' Derek, ',
 'and Derek talked to',
 'and ',
 ' Eric, ',
 'and Eric talked to',
 'and ',
 ' Fred, ',
 'and Fred talked to',
 'and ',
 ' Greg, ',
 'and Greg talked to',
 'and ',
 ' Henry, ',
 'and Henry talked to',
 'and ',
 ' Isaac, ',
 'and Isaac talked to',
 'and ',
 ' Jesse, ',
 'and Jesse talked to',
 'and ',
 ' Ken.']
在这里,它分别寻找这两个单元。我可能可以用这个,但如果它是一个单位会更好

另一种选择可能是:

pprint(re.split(r"([and ]?\w+ talked to)",text))
给出:

['Here is a list:',
 ' Bob talked to',
 ' Caleb, and',
 ' Caleb talked to',
 ' Derek, and',
 ' Derek talked to',
 ' Eric, and',
 ' Eric talked to',
 ' Fred, and',
 ' Fred talked to',
 ' Greg, and',
 ' Greg talked to',
 ' Henry, and',
 ' Henry talked to',
 ' Isaac, and',
 ' Isaac talked to',
 ' Jesse, and',
 ' Jesse talked to',
 ' Ken.']

在本例中,即使“and”可用,也不包括在内。那么,如何将“and”作为一个单元进行选择呢?换句话说,“and”是In或out,但不是In或out的部分。

我想这就是你想要的:

((?:and )?\w+ talked to)

(?:and)
是非捕获组,因此它匹配但未捕获。

太完美了!非常感谢。非捕获组现在敲响了警钟,在某种意义上说,它并没有当我第一次读到它!
((?:and )?\w+ talked to)