使用BreakIterator Java将带引号的文本拆分为句子

使用BreakIterator Java将带引号的文本拆分为句子,java,Java,我尝试使用BreakIterator Java将包含引用的段落拆分为句子 这是我想拆分的包含引用的段落: “人们现在越来越聪明,越来越挑剔。他们知道哪些人更聪明 有资格选择,哪一个锅,哪里有黄金,”他说。关于 埃迪说,应对即将到来的选举的策略是 仍在等待提供 这是我的代码: public class SplitParagraph { public static void main(String[] args){ String paragraph = "\"People are now

我尝试使用BreakIterator Java将包含引用的段落拆分为句子

这是我想拆分的包含引用的段落:

“人们现在越来越聪明,越来越挑剔。他们知道哪些人更聪明 有资格选择,哪一个锅,哪里有黄金,”他说。关于 埃迪说,应对即将到来的选举的策略是 仍在等待提供


这是我的代码:

public class SplitParagraph {
public static void main(String[] args){
    String paragraph = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
    BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
    iterator.setText(paragraph);
    int start = iterator.first();
    int i=1;
    for (int end = iterator.next();end != BreakIterator.DONE; start = end, end = iterator.next()) {
        System.out.println("Sentence "+i+" : "+paragraph.substring(start,end));
        i++;
    }
}}

输出程序:

第1句:“人们现在越来越聪明,越来越挑剔。
第二句:他们知道哪些是有资格选择的,哪一个锅,哪里有金子,”他说。 第三句:关于应对即将到来的选举的战略,Edi说,它仍在等待条款的出台

输出程序不正确,因为段落仅包含2 句子。不是三句话


正确的输出程序必须如下所示:

他说:“人们现在变得越来越聪明,越来越挑剔。他们知道哪些人有资格选择,哪一个锅,哪里有金子。”
第二句:关于应对即将到来的选举的战略,Edi说,它仍在等待条款的出台


对我的问题有什么想法吗?

只需根据下面的正则表达式拆分您的输入

"(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)"
输出:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.

String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<!Mrs?\\.)(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}

BreakIterator的目的是在分裂时避免许多陷阱,如“Smith先生和夫人”。正则表达式可能不是这里的最佳解决方案。顺便说一句,我不是选民。但op在他的问题中没有提到这个问题。请看@Pshemo:没问题。我的段落是印度尼西亚语。不是用户先生&Mrs@AvinashRaj例如我想问你。我还有第二个问题。如何按\n或\n \n示例文本拆分句子:我的名字是Zulkifli。我来自印度尼西亚。我喜欢Java。拆分后的结果:第1句:我叫祖尔基弗里。我来自印度尼西亚。句子3:我喜欢Java。使用
[\\n\\r]+
regex根据一个或多个换行符拆分字符串
String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<!Mrs?\\.)(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}
"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.