Nlp CRF模型接受复数训练，而不是单数训练_Nlp_Stanford Nlp_Crf

Nlp CRF模型接受复数训练，而不是单数训练

nlp stanford-nlp

Nlp CRF模型接受复数训练，而不是单数训练,nlp,stanford-nlp,crf,Nlp,Stanford Nlp,Crf,我做了一个CRF模型。我的数据集有24个类，现在我才刚刚开始，所以我的训练数据只有1200个标记/语料库。我已经训练过模特了。在我的培训数据中，我使用了复数标记，如地址、照片、州、国家等现在在测试时，如果我给这个模型以句子形式的复数标记，那么它工作得很好，但是如果我以单数形式输入我的句子，比如照片、状态等，那么它不会给它分配任何标记 crf的这种行为看起来很奇怪。我已经探索了，并使用了一些引理特性，但它也不起作用。共享我的austin.prop用于模型形成 # location of the

我做了一个CRF模型。我的数据集有24个类，现在我才刚刚开始，所以我的训练数据只有1200个标记/语料库。我已经训练过模特了。在我的培训数据中，我使用了复数标记，如地址、照片、州、国家等

现在在测试时，如果我给这个模型以句子形式的复数标记，那么它工作得很好，但是如果我以单数形式输入我的句子，比如照片、状态等，那么它不会给它分配任何标记

crf的这种行为看起来很奇怪。我已经探索了，并使用了一些引理特性，但它也不起作用。共享我的

austin.prop

用于模型形成

# location of the training file
trainFile = training_data_for_ner.txt
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = ner-model.ser.gz

# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1,pos=2,lemma=3

# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1

# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only 
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
# newly added features.
useLemmas=true
usePrevNextLemmas=true
useLemmaAsWord=true
useTags=true

最后四个特性是通过读取

NER Feature Factory

添加的。如果有人能帮我解决这个问题，我将非常感谢你。

你应该用带词干的代币重新训练它。参见（

main

method）例如。

我可以同时保留引理和词干标记吗？crf模型的形成可以吗？这取决于你的数据集。我想你不应该那样做。请注意，您始终可以通过交叉验证检查质量（F1分数或类似值），并选择最合适的选项。我将如何在地图中指定此特定列包含带词干的标记？假设培训文件中的第5列包含带词干的标记，那么我应该如何在austen.prop中编写地图。地图=。。。。。。stem=4或stemmer=4？我认为您应该用带词干的单词替换\u ner.txt的

training\u数据。就我个人而言，我已经通过编程方式访问了StanfordNLP，对austin.prop
没有太多经验，对此我深表歉意。