Stanford nlp 在Stanford NER上创建NER模型时出错

Stanford nlp 在Stanford NER上创建NER模型时出错,stanford-nlp,Stanford Nlp,在创建NER模型时,我收到一条错误消息,如下所示: Exception in thread "main" java.lang.RuntimeException: Got NaN for prob in CRFLogConditionalObjectiveFunction.calculate() - this may well indicate numeric underflow due to overly long documents. at edu.stanford.nlp.ie.cr

在创建NER模型时,我收到一条错误消息,如下所示:

Exception in thread "main" java.lang.RuntimeException: Got NaN for prob in CRFLogConditionalObjectiveFunction.calculate() - this may well indicate numeric underflow due to overly long documents.
    at edu.stanford.nlp.ie.crf.CRFLogConditionalObjectiveFunction.calculate(CRFLogConditionalObjectiveFunction.java:427)
    at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.ensure(AbstractCachingDiffFunction.java:140)
    at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.valueAt(AbstractCachingDiffFunction.java:145)
    at edu.stanford.nlp.optimization.QNMinimizer.lineSearchMinPack(QNMinimizer.java:1460)
    at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:1008)
    at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:857)
    at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:851)
    at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:93)
    at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1919)
    at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1726)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:758)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:746)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3034)
为了创建NER,我只使用了斯坦福NER网站[此处]的Java代码。

Java代码是:

java-cp stanford-ner.jar edu.stanford.nlp.ie.crf.crfclassizer-prop 06012017\u training.prop

此外,用于创建NER的TSV文件为35.369MB。 我试图只创建一个标题为“SYS”的标记

如何克服此错误并成功创建NER模型?
提前谢谢。

@stanfordnlphelp仅回答我自己的问题,当我分离所有标点符号然后删除它们时,我没有发现任何错误。另外,当使用代码进行标记化时,这是很好的选择!java-cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer jane-austen-emma-ch1.txt>jane-austen-emma-ch1.tok