Regex 正则表达式用于句子，但不包括网站_Regex_Web

Regex 正则表达式用于句子，但不包括网站

regex web

Regex 正则表达式用于句子，但不包括网站,regex,web,Regex,Web,我正在寻找一个句子正则表达式，将不会打破网站以及我的正则表达式是：（\（？[^\.]+[\.！\？]\）对于示例文本，我希望这是一段文字。这很有趣。然而，对于像google.com.xyz这样的测试网站来说，它已经崩溃了。三句话：这是一段文字这很有趣然而，对于像google.com.xyz这样的测试网站来说，它已经崩溃了然而，最后一句话被分成三部分：然而对于像谷歌这样的测试网站来说 com xyz它坏了我如何修改我的正则表达式以确保网站不会同时陷入这种情况？您可以尝试查找以下

我正在寻找一个句子正则表达式，将不会打破网站以及

我的正则表达式是：

（\（？[^\.]+[\.！\？]\）

对于示例文本，我希望

这是一段文字。这很有趣。然而，对于像google.com.xyz这样的测试网站来说，它已经崩溃了。

三句话：

这是一段文字

这很有趣

然而，对于像google.com.xyz这样的测试网站来说，它已经崩溃了

然而，最后一句话被分成三部分：

然而对于像谷歌这样的测试网站来说

com

xyz它坏了

我如何修改我的正则表达式以确保网站不会同时陷入这种情况？

您可以尝试查找以下正则表达式模式的所有匹配项：

(.*?\.)(?!\S)\s*

(.*?\.)   match AND capture all content up to and including a full stop
(?!\S)    which is followed by whitespace or end of the string
\s*       then consume any whitespace after the full stop but before the next sentence

Python中的示例脚本：

inp = "This is a paragraph of text. It is very interesting. Yet for a test website like google.com.xyz it's broken up."
parts = re.findall(r'(.*?\.)(?!\S)\s*', inp)
print(parts)

这张照片是：

['This is a paragraph of text.',
 'It is very interesting.',
 "Yet for a test website like google.com.xyz it's broken up."]

下面是对正则表达式模式的解释：

(.*?\.)(?!\S)\s*

(.*?\.)   match AND capture all content up to and including a full stop
(?!\S)    which is followed by whitespace or end of the string
\s*       then consume any whitespace after the full stop but before the next sentence