Java 训练失败_Java_Classification_Stanford Nlp

Java 训练失败

java stanford-nlp

Java 训练失败,java,classification,stanford-nlp,Java,Classification,Stanford Nlp,我对识别过程还是个新手，并且仍在努力了解更多。我有一个需要识别的项目：表名、人员、部门。我试着用Stanford NER和它的3个类一起使用，它确实能识别人的名字。对于部门名称，我试图训练NER将部门识别为组织。因为我没有找到关于如何为它们创建新注释的结果。我确实按照他们网站上的指示做了。首先，我创建了一个包含以下内容的txt文件：艾哈迈德在客户服务部工作。部门名称是客户服务。它始于1997年，被称为客户从那时起开始服务。客户服务部有一个经理和多个经理员工。客户服务部的编号为1

我对识别过程还是个新手，并且仍在努力了解更多。我有一个需要识别的项目：表名、人员、部门。我试着用Stanford NER和它的3个类一起使用，它确实能识别人的名字。对于部门名称，我试图训练NER将部门识别为组织。因为我没有找到关于如何为它们创建新注释的结果。我确实按照他们网站上的指示做了。首先，我创建了一个包含以下内容的txt文件：

艾哈迈德在客户服务部工作。部门名称是客户服务。它始于1997年，被称为客户从那时起开始服务。客户服务部有一个经理和多个经理员工。客户服务部的编号为1122D。艾哈迈德在发展部工作。该部门的名称是发展。它始于1997年，此后被称为发展。开发部有一名经理和多名员工。人数开发部是1122D。艾哈迈德在财务部工作。这个部门的名字是财务。它始于1997年，被称为从那时起，财政就开始了。财务部有一名经理和多名员工。这个财务部编号为1122D。艾哈迈德在人力资源部工作部门部门名称为人力资源部。它始于年 1997年，它被称为人力资源。人力资源部一位经理和许多员工。人力资源数量部门是1122D。艾哈迈德在市场部工作。这个部门名称是市场营销。它始于1997年，被称为从那时起，市场营销。市场部有一名经理和多名员工。营销部的数量为1122D

然后我使用了以下命令：

java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer corpus.txt > corpus.tok

perl -ne 'chomp; print "$_\tO\n"' corpus.tok > corpus.tsv 

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop corpus.prop

然后我得到了以下错误：

CRFClassifier invoked on Mon Dec 01 09:38:10 AST 2014 with arguments:
   -prop corpus.prop
argsToProperties could not read properties file: null
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:879)
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:818)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2869)
Caused by: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:448)
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:866)
    ... 2 more

我怎样才能正确地训练分级员

非常感谢

更新：这是我的.prop文件

#location of the training file
trainFile = /Users/ha/stanford-ner-2014-10-26/corpus.tsv
#location where you would like to save (serialize to) your
#classifier; adding .gz at the end automatically gzips the file,
#making it faster and smaller
serializeTo = dept-model.ser.gz

#structure of your training file; this tells the classifier
#that the word is in column 0 and the correct answer is in
#column 1
map = word=0,answer=1

#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

您是否创建了

语料库.prop

/您是否指向了正确的路径？似乎

crfclassizer

找不到该文件。我已编辑了答案并添加了prop文件。请尝试在命令行上提供此

corpus.prop

的绝对路径。看起来相对路径查找失败。现在可以工作了，谢谢。如果我已经在使用另一个分类器，如何在代码中使用它？String serializedClassifier=“classifiers/english.conll.4class.distsim.crf.ser.gz”；您在模型应保存到的

corpus.prop

文件中提供了一个相对路径（属性名

serializeTo

）。在Java代码中提供该文件的路径。