Nlp 在GATE中用多行注释句子

Nlp 在GATE中用多行注释句子,nlp,gate,java-annotations,Nlp,Gate,Java Annotations,我对GATE中的分句器模块有问题。我的文字是这样的: Social history. He drank a lot in his young age. He did not attend a school. He was depressed of his condition. 虽然我们确信句子应该像这样分开 Sentence 1: Social history. Sentence 2: He drank a lot in his young age. Sentence 3: He did no

我对GATE中的分句器模块有问题。我的文字是这样的:

Social history. He drank a lot in his young age. He did
not attend a school. He was depressed of his condition.
虽然我们确信句子应该像这样分开

Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did not attend a school.
Sentence 4: He was depressed of his condition.
ANNIE句子拆分器认识到不同行中的文本应分组到不同的句子中,因此产生以下结果:

Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did 
Sentence 4: not attend a school.
Sentence 5: He was depressed of his condition.
这是因为这个句子被分成多行。有没有一种方法可以告诉分句器这个句子可能不止一行?或者有没有更好的方法来识别此类文本中的句子


谢谢:)

试着用正则分句器代替安妮

使用ANNIE句子拆分器,可以使用参数TransducerURL,默认情况下,该参数指向如下内容:

/PATH-TO-GATE/plugins/ANNIE/resources/sentenceSplitter/grammar/main-single-nl.jape

在此文件夹中还有一个名为:

/PATH-TO-GATE/plugins/ANNIE/resources/sentenceSplitter/grammar/main.jape


如果您更改它,它应该会工作。

您可能正在将一行传递给分句器。你应该先阅读完整的文件,然后将完整的文本传递给分句器。事实上,我使用的是GATE开发者,所以我想我一次传递了所有的句子@Ravithankyou,事实上它在网站上有记录,但我的不好,我没有检查它。我试着用你提到的方法,它是有效的!但另一个问题出现了,一些行没有用句号结束,分句器将其过度压缩到下一行。所以我想我必须决定哪一个更好,哪一个有点缺点。如果这对你来说是个问题,你可以试着编辑规则文件。也许你会想出一种方法来抓住这些特殊情况:)