PHP句子边界包括空行吗?
这是on SO的扩展 我想知道如何更改正则表达式以保留换行符 将一些文本逐句拆分,删除一个句子,然后重新组合的示例代码:PHP句子边界包括空行吗?,php,regex,Php,Regex,这是on SO的扩展 我想知道如何更改正则表达式以保留换行符 将一些文本逐句拆分,删除一个句子,然后重新组合的示例代码: <?php $re = '/# Split sentences on whitespace between them. (?<= # Begin positive lookbehind. [.!?] # Either an end of sentence punct, | [.!?]
<?php
$re = '/# Split sentences on whitespace between them.
(?<= # Begin positive lookbehind.
[.!?] # Either an end of sentence punct,
| [.!?][\'"] # or end of sentence punct and quote.
) # End positive lookbehind.
(?<! # Begin negative lookbehind.
Mr\. # Skip either "Mr."
| Mrs\. # or "Mrs.",
| Ms\. # or "Ms.",
| Jr\. # or "Jr.",
| Dr\. # or "Dr.",
| Prof\. # or "Prof.",
| Sr\. # or "Sr.",
| T\.V\.A\. # or "T.V.A.",
# or... (you get the idea).
) # End negative lookbehind.
[\s+|^$] # Split on whitespace between sentences/empty lines.
/ix';
$text = <<<EOL
This is paragraph one. This is sentence one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
EOL;
echo "\nBefore: \n" . $text . "\n";
$sentences = preg_split($re, $text, -1);
$sentences[1] = " "; // remove 'sentence one'
// put text back together
$text = implode( $sentences );
echo "\nAfter: \n" . $text . "\n";
?>
我试图让“后”文本与“前”文本相同,只是删除了一句话
After:
This is paragraph one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
我希望这可以通过正则表达式的调整来实现,但是我缺少什么呢?模式的结尾应该替换为:
(?:\h+|^$) # Split on whitespace between sentences\/empty lines.
/mix';
看
请注意,[\s+| ^$]
确实匹配空格(水平和垂直,如换行符)、+
、|
、^
和$
符号,因为它是一个字符类
需要一个组(更好,这里不捕获),而不是一个角色类。在一个组(用(…)
标记)中,|
作为一个交替操作符工作
我建议使用仅匹配水平空白(无换行符)的\s
,而不是\h
如果未使用/m
多行修饰符,则^$
仅匹配空字符串。因此,我在选项中添加了/m
修饰符
请注意,我必须在最后一条注释中转义
/
,否则会出现一个警告,说明正则表达式不正确。或者,使用不同的正则表达式分隔符。此正则表达式中似乎存在问题:[\s+^$]
确实匹配空格、+
、
、^
和$
符号。用(?:\h+| ^$)
替换它,我想就是这样。我想你可以在\s
或\s{1}
之后删除+
,如果你真的需要它来匹配一个,因为\s+
占用了其他空格。基本上,您需要数组(“stuf”、“\n”、“stuff”)代码>但如果不进行测试就无法确定,而且它太复杂了,无法在我的脑海中运行。谢谢。这几乎奏效,但有一个怪癖:preg_split regex将其中两个句子组合在一起。看到什么主意了吗?还感谢您的解释,我不熟悉它。如果您添加一个PREG_SPLIT_DELIM_CAPTURE
,使用一个带有(\h+| ^$)
的捕获组,并将索引2处的元素归零,会怎么样?看见
(?:\h+|^$) # Split on whitespace between sentences\/empty lines.
/mix';