Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用ApacheOpenNLP查找以空格分隔的名称_Java_Opennlp_Named Entity Recognition - Fatal编程技术网

Java 使用ApacheOpenNLP查找以空格分隔的名称

Java 使用ApacheOpenNLP查找以空格分隔的名称,java,opennlp,named-entity-recognition,Java,Opennlp,Named Entity Recognition,我正在使用apacheopennlp的NER。我已成功培训了我的自定义数据。在使用名称查找器时,我根据空格分割给定字符串并传递字符串数组,如下所示 NameFinderME nameFinder = new NameFinderME(model); String []sentence = input.split(" "); //eg:- input = Give me list of test case in project X Span nameSpans[] = nameFinder.

我正在使用apacheopennlp的NER。我已成功培训了我的自定义数据。在使用名称查找器时,我根据空格分割给定字符串并传递字符串数组,如下所示

NameFinderME nameFinder = new NameFinderME(model);   
String []sentence = input.split(" "); //eg:- input = Give me list of test case in project X
Span nameSpans[] = nameFinder.find(sentence);

在这里,当我使用split时,test和case作为单独的值给出,并且从未被namefinder检测到。我将如何克服上述问题。是否有一种方法可以传递完整的字符串(而不将其拆分为数组),这样,测试用例本身就可以被视为一个整体?

您可以使用正则表达式来完成。尝试用以下内容替换第二行:


String[]session=input.split(\\s(?好的,如果我有很多空格分隔的单词(从15到20不等),该如何使用
split()
在这种情况下会起作用吗?在这种情况下,遵循这种方法是否有效?@HariRam请检查我的第二次编辑。我添加了一些这样做的代码。伙计!但问题是,我可能还有3或4个空格(在循环中检测到缺陷)在单词之间。如果它们之间有3或4个空格,正则表达式应该是什么样子?我不介意编写一个生成正则表达式的函数。我只是在上面提到的情况下需要正则表达式字符串的格式。这种情况有点棘手。最简单的方法是用单个空格替换多个空格,然后运行生成的正则表达式gex.这在您的场景中可以接受吗?在这种情况下,在运行
split()
之前,您应该执行
input=input.replaceAll([\\s\\t]+“,”);
No.。实际上,我所说的多个空格的意思是,“测试用例id”有两个空格。而“测试用例”有一个。我不会有连续的空格(如“测试用例”)就我而言。
Give
me
list
of
test case
in
project
X
class NoSeparation {

private static String[][] unseparated = {{"test", "case"}, {"in", "project"}};

private static String getRegex() {
    String regex = "\\s(?<!";

    for (int i = 0; i < unseparated.length; i++)
        regex += "(\\s" + separated[i][0] + "\\s(?=" + separated[i][1] + "\\s))|";

    // Remove the last |
    regex = regex.substring(0, regex.length() - 1);

    return (regex + ")");
}

public static void main(String[] args) {
    String input = "Give me list of test case in project X";
    String []sentence = input.split(getRegex());

    for (String i: sentence)
        System.out.println(i);
}
}
class NoSeparation {

private static final String SEPARATOR = "%%";
private static String[][] unseparated = {{"of", "test", "case"}, {"in", "project"}};

private static String[] splitString(String in) {
    String[] splitted;

    for (int i = 0; i < unseparated.length; i++) {
        String toReplace = "";
        String replaceWith = "";
        for (int j = 0; j < unseparated[i].length; j++) {
            toReplace += unseparated[i][j] + ((j < unseparated[i].length - 1)? " " : "");
            replaceWith += unseparated[i][j] + ((j < unseparated[i].length - 1)? SEPARATOR : "");
        }

        in = in.replaceAll(toReplace, replaceWith);
    }

    splitted = in.split(" ");

    for (int i = 0; i < splitted.length; i++)
        splitted[i] = splitted[i].replaceAll(SEPARATOR, " ");

    return splitted;
}

public static void main(String[] args) {
    String input = "Give me list of test case in project X";
    // Uncomment this if there is a chance to have multiple spaces/tabs
    // input = input.replaceAll("[\\s\\t]+", " ");

    for (String str: splitString(input))
        System.out.println(str);
}
}