Java 提取指定单词后的所有单词，直到句子结束_Java_Regex_Stanford Nlp

Java 提取指定单词后的所有单词，直到句子结束

java regex stanford-nlp

Java 提取指定单词后的所有单词，直到句子结束,java,regex,stanford-nlp,Java,Regex,Stanford Nlp,我需要提取以下单词后的所有单词，直到句子结尾/[Ee]ach+/[tag:NN]+|[tag:NNS]+//has+/|/have+//但我在第13行中遇到错误，下面是我的代码： 1 String file="Each campus has one club. Each programme has a unique code, title, level and duration."; 2 Properties props = new Properties(); 3 props.p

我需要提取以下单词后的所有单词，直到句子结尾/[Ee]ach+/[tag:NN]+|[tag:NNS]+//has+/|/have+//但我在第13行中遇到错误，下面是我的代码：

 1  String file="Each campus has one club. Each programme has a unique code, title, level and   duration.";
 2  Properties props = new Properties();
 3  props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
 4  StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
 5  Annotation document = new Annotation(file);
 6  pipeline.annotate(document);
 7  List<CoreLabel> tokens = new ArrayList<CoreLabel>();

 8  List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
 9  for(CoreMap sentence: sentences) 
10  {
11      for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) 
12         tokens.add(token); 

13      TokenSequencePattern pattern = TokenSequencePattern.compile("(/[Ee]ach+/) ([tag:NN]+|[tag:NNS]+) (/has+/|/have+/) [A-Z]");
14      TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
15      while( matcher.find()){
16          JOptionPane.showMessageDialog(rootPane, matcher.group()); 
17          String matched = matcher.group();
18      }
19      tokens.removeAll(tokens);
20  }

我想你是指这个正则表达式：

(?i)each[^.]+[.]

正则表达式为Java字符串：

"(?i)each[^.]+[.]"

以及使用它的Java代码：

String file = "Each campus has one club. Each programme has a unique code, title, level and   duration.";
    String pattern = "(?i)each[^.]+[.]";
    Pattern compile = Pattern.compile(pattern);
    Matcher matcher = compile.matcher(file);
    while (matcher.find()) {            
        JOptionPane.showMessageDialog(null, matcher.group(0));
    }

在许多语言中，您在正则表达式周围看到的斜杠，例如

/someregex/

与正则表达式没有任何关系：斜杠是一种应用程序语言，java不是使用它们的语言之一

一旦去掉这些斜杠，修复正则表达式的更改，删除意图不正确的字符类和一些其他调整，这个正则表达式应该可以工作：

([Ee]ach|tag:NNS?|ha(s|ve)) +\w+

什么是预期输出？预期输出是。每个校园在第一句中都有一个俱乐部，每个课程在第二句中都有一个唯一的代码、标题、级别和持续时间。问题可能重复，你能帮忙吗？可能会显示示例输入和预期匹配-这会有帮助。如果输入是，请创建一个包含以下内容的数据库，每个员工都有姓名、地址和电话号码。输出应该是每个员工的姓名、地址和电话号码。那么，您希望从第一次出现的单词each、has或have到句子的下一个结尾的所有输入吗？