Regex 用R将段落分成句子

Regex 用R将段落分成句子,regex,r,strsplit,Regex,R,Strsplit,我使用strsplit函数来实现这一点 为此,我找到了许多正则表达式: (?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s 我想要两个句子 As of Feb. 9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed with BSE. The government ha

我使用strsplit函数来实现这一点

为此,我找到了许多正则表达式:

(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s
我想要两个句子

As of Feb. 9, the Ministry of Agriculture, Fisheries and Food
said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.
但上面的正则表达式将其分为三句:

As of Feb.
9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.

我不明白你想用这两个否定的lookbehind(
(?)做什么。你真的需要一个肯定的lookbehind,你必须在
之前搜索句点和问号(?首先,你需要一个双反斜杠“\\”来表示转义字符(一个表示R引号,另一个表示regex)。第二,你可以写你的表达式,找出句点后跟一个空格,然后是大写字母或非数字。这两种方法在你的例子中都适用。这种情况是正确的,但一般来说如何处理缩写。例如下面的例子美国农业部驻华盛顿发言人玛格丽特·韦伯(Margaret Webb)说:“由于这种疾病,美国农业部动植物健康检查局在7月份禁止从英国进口牛、胚胎和公牛精液。这让我有两种感觉:“因为这种疾病,美国。”农业部动植物健康检查局(Department of Agricultures Animal and Plant Health Inspection Service)在7月份禁止从英国进口牛胚胎和公牛精液,美国农业部驻华盛顿发言人玛格丽特·韦伯(Margaret Webb)说,“是的,问题是英文缩写确实不标准(“U.S.”、“等等”,“Dept.”)使用正则表达式真的无法区分“我住在美国,它是一个国家”和“我在美国教育部工作”之间的区别。“我们之所以能区别开来,是因为我们了解内容。处理这类事情的唯一方法是进入NLP,或者至少使用一个基于NLP的解析器。因此,使用正则表达式无法解决这个问题??不。对于NLP,它是
library(NLP);library(openNLP);一个
As of Feb. 9, the Ministry of Agriculture, Fisheries and Food
said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.
As of Feb.
9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.
 strsplit(txt1, "(?<=\\.|\\?)\\s(?=[A-Z])", perl = TRUE)
[[1]]
[1] "As of Feb. 9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed with BSE."
[2] "The government has paid $6.1 million in compensation, and is budgeting $16 million for 1990."