关于在句子中查找单词的Java查询_Java_String_Nlp_Stanford Nlp_Sentence

关于在句子中查找单词的Java查询

java string nlp stanford-nlp

关于在句子中查找单词的Java查询,java,string,nlp,stanford-nlp,sentence,Java,String,Nlp,Stanford Nlp,Sentence,我正在使用斯坦福大学的NLP解析器(http://nlp.stanford.edu/software/lex-parser.shtml)将一段文字拆分成句子，然后查看哪些句子包含给定的单词以下是我目前的代码： import java.io.FileReader; import java.io.IOException; import java.util.List; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.process.*

我正在使用斯坦福大学的NLP解析器(http://nlp.stanford.edu/software/lex-parser.shtml)将一段文字拆分成句子，然后查看哪些句子包含给定的单词

以下是我目前的代码：

import java.io.FileReader;
import java.io.IOException;
import java.util.List;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.process.*;

public class TokenizerDemo {

    public static void main(String[] args) throws IOException {
        DocumentPreprocessor dp = new DocumentPreprocessor(args[0]);
        for (List sentence : dp) {
            for (Object word : sentence) {
                System.out.println(word);
                System.out.println(word.getClass().getName());
                if (word.equals(args[1])) {
                    System.out.println("yes!\n");
                }
            }
        }
    }
}

我使用“java TokenizerDemo testfile.txt”从命令行运行代码

testfile.txt的内容是：

Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall.

因此，我希望程序在第一句中检测“wall”（“wall”作为命令行上的第二个参数输入）。但是程序没有检测到“墙”，因为它从不打印“是！”。程序的输出为：

Humpty
edu.stanford.nlp.ling.Word
Dumpty
edu.stanford.nlp.ling.Word
sat
edu.stanford.nlp.ling.Word
on
edu.stanford.nlp.ling.Word
a
edu.stanford.nlp.ling.Word
wall
edu.stanford.nlp.ling.Word
.
edu.stanford.nlp.ling.Word
Humpty
edu.stanford.nlp.ling.Word
Dumpty
edu.stanford.nlp.ling.Word
had
edu.stanford.nlp.ling.Word
a
edu.stanford.nlp.ling.Word
great
edu.stanford.nlp.ling.Word
fall
edu.stanford.nlp.ling.Word
.
edu.stanford.nlp.ling.Word

斯坦福解析器的DocumentPreprocessor将文本正确地分成两句。问题似乎在于使用equals方法。每个单词都有“edu.stanford.nlp.ling.word”类型。我已经尝试访问单词的底层字符串，因此我可以检查字符串是否等于“wall”，但我不知道如何访问它

如果我将第二个for循环写为“for（Word:句子）{”，那么我会在complilation上得到一条不兼容类型的错误消息。

可以通过调用方法来访问

字符串

内容：on

edu.stanford.nlp.ling.Word

；例如

import edu.stanford.nlp.ling.Word;

List<Word> words = ...
for (Word word : words) {
  if (word.word().equals(args(1))) {
    System.err.println("Yes!");
  }
}

因为单词可以优雅地打印出来，一个简单的

word.toString（）.equals（arg[1]）

就足够了。

Hi，谢谢你的回答。如果我使用“If（word.word（）.equals（args[1]）{”这行，那么我会得到错误：“找不到符号-方法词（）”。我想原因是因为我有for（Object word:句子），而你有for（单词：句子）。但是，如果我为（单词：句子）写作，我会得到不兼容的类型错误。我不明白如何编写列表单词=…以确保列表中的元素是类型单词。你能扩展一下这部分吗？我的意思是，你能写下你想去的地方吗哦，我刚刚看到了第二次编辑。第一次成功了，但让我看看你的新编辑好的观点！我原本以为输出包含类名，但看到OP单独打印出来。尽管如此，学究式的我还是更喜欢调用word（），以防toString（）实现发生变化。

DocumentPreprocessor dp = ...
for (HasWord hw : dp) {
  if (hw.word().equals(args[1])) {
    System.err.println("Yes!");
  }
}