Java 使用OpenNlp逐步训练不同语言的chunker模型,并获得预测序列的概率分数
我确实阅读了下面链接的可用文档。但事情还不清楚,我该怎么办?我遵循了正确的输入训练格式,但错误如下 cmd命令:Java 使用OpenNlp逐步训练不同语言的chunker模型,并获得预测序列的概率分数,java,model,nlp,opennlp,maxent,Java,Model,Nlp,Opennlp,Maxent,我确实阅读了下面链接的可用文档。但事情还不清楚,我该怎么办?我遵循了正确的输入训练格式,但错误如下 cmd命令: ./opennlp ChunkerTrainerME -model hn-chunker.bin -lang hn -data sampletrain.txt -encoding UTF-8 错误: Skipping corrupt line: इसके PRP NP Skipping corrupt line: साथ NST NP Skipping corrupt line:
./opennlp ChunkerTrainerME -model hn-chunker.bin -lang hn -data sampletrain.txt -encoding UTF-8
错误:
Skipping corrupt line: इसके PRP NP
Skipping corrupt line: साथ NST NP
Skipping corrupt line: ही RP NP
Skipping corrupt line: पार्टी NN NP2
Skipping corrupt line: ने PSP NP2
Skipping corrupt line: सरकार NN NP3
Skipping corrupt line: से PSP NP3
Skipping corrupt line: इस DEM NP4
Skipping corrupt line: मसले NN NP4
Skipping corrupt line: पर PSP NP4
Skipping corrupt line: बयान NN NP5
Skipping corrupt line: देने VM VGNN
Skipping corrupt line: की PSP VGNN
Skipping corrupt line: मांग NN NP6
Skipping corrupt line: की VM VGF
Skipping corrupt line: है VAUX VGF
done. 0 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTool.java:68)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)
参考资料:我不确定您是否还在寻找答案,但我发现问题出在您的培训文本文件中,它只需要单词和标记之间的空格。标记之间可能有多个空格 例如:跳过损坏的行:पार्टी NN NP2 中间有3个空格पार्टी 和NN。 有3个空格 在NN和NP2之间。