Machine learning 某些HPC群集在运行Stanford CoreNLP时是否只缓存一个结果？_Machine Learning_Nlp_Stanford Nlp_Hpc

Machine learning 某些HPC群集在运行Stanford CoreNLP时是否只缓存一个结果？

machine-learning nlp stanford-nlp

Machine learning 某些HPC群集在运行Stanford CoreNLP时是否只缓存一个结果？,machine-learning,nlp,stanford-nlp,hpc,Machine Learning,Nlp,Stanford Nlp,Hpc,我正在使用斯坦福CoreNLP库进行一个Java项目。我创建了一个名为StanfordNLP的类，实例化了两个不同的对象，并使用不同的字符串作为参数初始化构造函数。我用POS-tagger获得形容词-名词序列。但是，程序的输出仅显示第一个对象的结果。每个StanfordNLP对象都使用不同的字符串初始化，但每个对象都返回与第一个对象相同的结果。我是Java新手，所以我不知道我的代码是否有问题，或者它运行的HPC集群是否有问题我没有从StanfordNLP类方法返回字符串列表，而是尝试使用get

我正在使用斯坦福CoreNLP库进行一个Java项目。我创建了一个名为StanfordNLP的类，实例化了两个不同的对象，并使用不同的字符串作为参数初始化构造函数。我用POS-tagger获得形容词-名词序列。但是，程序的输出仅显示第一个对象的结果。每个StanfordNLP对象都使用不同的字符串初始化，但每个对象都返回与第一个对象相同的结果。我是Java新手，所以我不知道我的代码是否有问题，或者它运行的HPC集群是否有问题

我没有从StanfordNLP类方法返回字符串列表，而是尝试使用getter。我还尝试将第一个StanfordNLP对象设置为null，这样它就不会引用任何内容，然后创建了其他对象。什么都不管用

/* in main */
List<String> pos_tokens0 = new ArrayList<String>();
List<String> pos_tokens1 = new ArrayList<String>();

String text0 = "Mary little lamb white fleece like snow"
StanfordNLP snlp0 = new StanfordNLP(text0);
pos_tokens0 = snlp0.process();

String text1 = "Everywhere little Mary went fluffy lamb ate green grass"
StanfordNLP snlp1 = new StanfordNLP(text1);
pos_tokens1 = snlp1.process();


/* in StanfordNLP.java */
public class StanfordNLP {

    private static List<String> pos_adjnouns = new ArrayList<String>();
    private String documentText = "";

    public StanfordNLP() {}
    public StanfordNLP(String text) { this.documentText = text; }

    public List<String> process() {     
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
        props.setProperty("coref.algorithm", "neural");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);    
        Annotation document = new Annotation(documentText);
        pipeline.annotate(document);

        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        List<String[]> corpus_temp = new ArrayList<String[]>();
        int count = 0;
    
        for(CoreMap sentence: sentences) {
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                String[] data = new String[2];
                String word = token.get(TextAnnotation.class);
                String pos = token.get(PartOfSpeechAnnotation.class);
                count ++;

                data[0] = word;
                data[1] = pos;         
                corpus_temp.add(data);
            }           
        }
    
        String[][] corpus = corpus_temp.toArray(new String[count][2]);
    
        // corpus contains string arrays with a word and its part-of-speech.
        for (int i=0; i<(corpus.length-3); i++) { 
            String word = corpus[i][0];
            String pos = corpus[i][1];
            String word2 = corpus[i+1][0];
            String pos2 = corpus[i+1][1];

            // find adjectives and nouns (eg, "fast car")
            if (pos.equals("JJ")) {         
                if (pos2.equals("NN") || pos2.equals("NNP") || pos2.equals("NNPS")) {
                    word = word + " " + word2;
                    pos_adjnouns.add(word);
                }
            }
        }
        return pos_adjnouns;
}

主菜单中的

/**/
List pos_tokens0=new ArrayList（）；
List pos_tokens1=new ArrayList（）；
String text0=“玛丽小羔羊白色羊毛像雪”
斯坦福德NLP snlp0=新斯坦福德NLP（text0）；
pos_tokens0=snlp0.process（）；
String text1=“小玛丽所到之处，毛茸茸的小羊吃绿草”
斯坦福德NLP snlp1=新斯坦福德NLP（文本1）；
pos_tokens1=snlp1.process（）；
/*在StanfordNLP.java中*/
公共级斯坦福德NLP{
私有静态列表pos_adjnomes=newarraylist（）；
私有字符串documentText=“”；
公共斯坦福德NLP（）{}
公共斯坦福德NLP（字符串文本）{this.documentText=text；}
公共列表进程（）{
Properties props=新属性（）；
props.setProperty（“注释器”、“标记化、ssplit、pos、引理、ner、depparse”）；
属性（“核心算法”、“神经”）；
StanfordCoreNLP管道=新的StanfordCoreNLP（道具）；
注释文档=新注释（documentText）；
管道注释（文件）；
列出句子=document.get（SentencesAnnotation.class）；
List corpus_temp=new ArrayList（）；
整数计数=0；
for（CoreMap句子：句子）{
for（CoreLabel标记：句子.get（TokensAnnotation.class））{
字符串[]数据=新字符串[2]；
String word=token.get（TextAnnotation.class）；
String pos=token.get（speechannotation.class的一部分）；
计数++；
数据[0]=字；
数据[1]=pos；
语料库临时添加（数据）；
}           
}
字符串[][]语料库=语料库临时数组（新字符串[count][2]）；
//语料库包含带有单词及其词性的字符串数组。
对于（int i=0；i而言，问题看起来很简单，您的pos_adjnomes
变量是static
，因此在StanfordNLP
的所有实例之间共享。请尝试删除static
关键字，然后查看是否按照您的预期工作
但这样仍然是不对的，因为您将有一个实例变量，并且多次调用process（）
，一些内容将不断添加到pos\u adjnomes
列表中。您应该做的其他两件事是：
在process（）
方法中使pos\u adjnomes
成为方法变量
相反，初始化StanfordCoreNLP管道的成本很高，因此您应该将其从process（）
方法中移出，并在类构造函数中执行。相反，构造函数初始化管道和process（）可能会更好
方法以获取字符串
进行分析