Regex 用R将段落分成句子

Regex 用R将段落分成句子,regex,r,strsplit,Regex,R,Strsplit,我使用strsplit函数来实现这一点 为此,我找到了许多正则表达式: (?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s 我想要两个句子 As of Feb. 9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed with BSE. The government ha




As of Feb. 9, the Ministry of Agriculture, Fisheries and Food
said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.

As of Feb.
9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.

之前搜索句点和问号(?首先,你需要一个双反斜杠“\\”来表示转义字符(一个表示R引号,另一个表示regex)。第二,你可以写你的表达式,找出句点后跟一个空格,然后是大写字母或非数字。这两种方法在你的例子中都适用。这种情况是正确的,但一般来说如何处理缩写。例如下面的例子美国农业部驻华盛顿发言人玛格丽特·韦伯(Margaret Webb)说:“由于这种疾病,美国农业部动植物健康检查局在7月份禁止从英国进口牛、胚胎和公牛精液。这让我有两种感觉:“因为这种疾病,美国。”农业部动植物健康检查局(Department of Agricultures Animal and Plant Health Inspection Service)在7月份禁止从英国进口牛胚胎和公牛精液,美国农业部驻华盛顿发言人玛格丽特·韦伯(Margaret Webb)说,“是的,问题是英文缩写确实不标准(“U.S.”、“等等”,“Dept.”)使用正则表达式真的无法区分“我住在美国,它是一个国家”和“我在美国教育部工作”之间的区别。“我们之所以能区别开来,是因为我们了解内容。处理这类事情的唯一方法是进入NLP,或者至少使用一个基于NLP的解析器。因此,使用正则表达式无法解决这个问题??不。对于NLP,它是
As of Feb. 9, the Ministry of Agriculture, Fisheries and Food
said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.
As of Feb.
9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed
with BSE.
The government has paid $6.1 million in compensation, and is
budgeting $16 million for 1990.
 strsplit(txt1, "(?<=\\.|\\?)\\s(?=[A-Z])", perl = TRUE)
[1] "As of Feb. 9, the Ministry of Agriculture, Fisheries and Food said that 9,998 cattle have been destroyed after being diagnosed with BSE."
[2] "The government has paid $6.1 million in compensation, and is budgeting $16 million for 1990."