Stanford nlp 斯坦福POS tagger赢得'；t使预先标记的文本变元化_Stanford Nlp_Tokenize_Lemmatization_Pos Tagger

Stanford nlp 斯坦福POS tagger赢得'；t使预先标记的文本变元化

stanford-nlp

Stanford nlp 斯坦福POS tagger赢得'；t使预先标记的文本变元化,stanford-nlp,tokenize,lemmatization,pos-tagger,Stanford Nlp,Tokenize,Lemmatization,Pos Tagger,我想根据上下文对文本进行柠檬化和POS标记。常见问题解答中提供的此命令正常工作： $ java -cp "*:lib/*" edu.stanford.nlp.tagger.maxent.MaxentTagger \ -model models/english-left3words-distsim.tagger \ -textFile samsawme.txt -outputFormat inlineXML \ -outputFormatOptions lemmatize

我想根据上下文对文本进行柠檬化和POS标记。常见问题解答中提供的此命令正常工作：

$ java -cp "*:lib/*" edu.stanford.nlp.tagger.maxent.MaxentTagger \
    -model models/english-left3words-distsim.tagger \
    -textFile samsawme.txt -outputFormat inlineXML \
    -outputFormatOptions lemmatize -sentenceDelimiter newline

输出：

<?xml version="1.0" encoding="UTF-8"?> 
<pos> 
<sentence id="0"> 
<word wid="0" pos="NNP" lemma="Sam">Sam</word> 
<word wid="1" pos="VBD" lemma="see">saw</word> 
<word wid="2" pos="PRP" lemma="I">me</word> 
<word wid="3" pos="." lemma=".">.</word> 
</sentence> 
</pos>

<?xml version="1.0" encoding="UTF-8"?>
<pos>
<sentence id="0">
  <word wid="0" pos="NNP">Sam</word>
  <word wid="1" pos="VBD">saw</word>
  <word wid="2" pos="PRP">me</word>
  <word wid="3" pos=".">.</word>
</sentence>
</pos>

命令：

$ java -cp "*:lib/*" edu.stanford.nlp.tagger.maxent.MaxentTagger \
    -model models/english-left3words-distsim.tagger \
    -textFile samsawme_tokenized.txt -outputFormat inlineXML \
    -outputFormatOptions lemmatize -sentenceDelimiter newline \
    -tokenize false # !!!

输出：

<?xml version="1.0" encoding="UTF-8"?> 
<pos> 
<sentence id="0"> 
<word wid="0" pos="NNP" lemma="Sam">Sam</word> 
<word wid="1" pos="VBD" lemma="see">saw</word> 
<word wid="2" pos="PRP" lemma="I">me</word> 
<word wid="3" pos="." lemma=".">.</word> 
</sentence> 
</pos>

<?xml version="1.0" encoding="UTF-8"?>
<pos>
<sentence id="0">
  <word wid="0" pos="NNP">Sam</word>
  <word wid="1" pos="VBD">saw</word>
  <word wid="2" pos="PRP">me</word>
  <word wid="3" pos=".">.</word>
</sentence>
</pos>


山姆
锯
我
.

在标记预先标记但不一定是元素化的文本时，是否有任何变通方法来包含引理