Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用NLTK标记词性标记词?_Python_Nlp_Nltk - Fatal编程技术网

Python 用NLTK标记词性标记词?

Python 用NLTK标记词性标记词?,python,nlp,nltk,Python,Nlp,Nltk,我用nltk.POS_tag()对一些单词进行了POS标记,因此它们被赋予了树库标记。我想用已知的POS标签将这些单词进行语法化,但我不确定如何进行。我在看Wordnet lemmatizer,但我不确定如何将树库POS标记转换为lemmatizer接受的标记。如何简单地执行此转换,或者是否有使用树库标记的lemmatizer?wordnet lemmatizer只知道四个词类(ADJ、ADV、名词和动词),只有名词和动词规则做任何特别有趣的事情。树库标记集中的名词词类均以NN开头,动词标记均以

我用nltk.POS_tag()对一些单词进行了POS标记,因此它们被赋予了树库标记。我想用已知的POS标签将这些单词进行语法化,但我不确定如何进行。我在看Wordnet lemmatizer,但我不确定如何将树库POS标记转换为lemmatizer接受的标记。如何简单地执行此转换,或者是否有使用树库标记的lemmatizer?

wordnet lemmatizer只知道四个词类(ADJ、ADV、名词和动词),只有名词和动词规则做任何特别有趣的事情。树库标记集中的名词词类均以NN开头,动词标记均以VB开头,形容词标记以JJ开头,副词标记以RB开头。因此,从一组标签转换到另一组标签非常容易,例如:

from nltk.corpus import wordnet

morphy_tag = {'NN':wordnet.NOUN,'JJ':wordnet.ADJ,'VB':wordnet.VERB,'RB':wordnet.ADV}[penn_tag[:2]]

正如@engineercoding在对@rmalouf答案的评论中指出的,与WordNet相比,树库中的标签要多得多,请参阅

以下映射涵盖尽可能多的基,它还明确定义了在WordNet中不匹配的POS标记:

# Create a map between Treebank and WordNet 
from nltk.corpus import wordnet as wn

# WordNet POS tags are: NOUN = 'n', ADJ = 's', VERB = 'v', ADV = 'r', ADJ_SAT = 'a'
# Descriptions (c) https://web.stanford.edu/~jurafsky/slp3/10.pdf
tag_map = {
        'CC':None, # coordin. conjunction (and, but, or)  
        'CD':wn.NOUN, # cardinal number (one, two)             
        'DT':None, # determiner (a, the)                    
        'EX':wn.ADV, # existential ‘there’ (there)           
        'FW':None, # foreign word (mea culpa)             
        'IN':wn.ADV, # preposition/sub-conj (of, in, by)   
        'JJ':[wn.ADJ, wn.ADJ_SAT], # adjective (yellow)                  
        'JJR':[wn.ADJ, wn.ADJ_SAT], # adj., comparative (bigger)          
        'JJS':[wn.ADJ, wn.ADJ_SAT], # adj., superlative (wildest)           
        'LS':None, # list item marker (1, 2, One)          
        'MD':None, # modal (can, should)                    
        'NN':wn.NOUN, # noun, sing. or mass (llama)          
        'NNS':wn.NOUN, # noun, plural (llamas)                  
        'NNP':wn.NOUN, # proper noun, sing. (IBM)              
        'NNPS':wn.NOUN, # proper noun, plural (Carolinas)
        'PDT':[wn.ADJ, wn.ADJ_SAT], # predeterminer (all, both)            
        'POS':None, # possessive ending (’s )               
        'PRP':None, # personal pronoun (I, you, he)     
        'PRP$':None, # possessive pronoun (your, one’s)    
        'RB':wn.ADV, # adverb (quickly, never)            
        'RBR':wn.ADV, # adverb, comparative (faster)        
        'RBS':wn.ADV, # adverb, superlative (fastest)     
        'RP':[wn.ADJ, wn.ADJ_SAT], # particle (up, off)
        'SYM':None, # symbol (+,%, &)
        'TO':None, # “to” (to)
        'UH':None, # interjection (ah, oops)
        'VB':wn.VERB, # verb base form (eat)
        'VBD':wn.VERB, # verb past tense (ate)
        'VBG':wn.VERB, # verb gerund (eating)
        'VBN':wn.VERB, # verb past participle (eaten)
        'VBP':wn.VERB, # verb non-3sg pres (eat)
        'VBZ':wn.VERB, # verb 3sg pres (eats)
        'WDT':None, # wh-determiner (which, that)
        'WP':None, # wh-pronoun (what, who)
        'WP$':None, # possessive (wh- whose)
        'WRB':None, # wh-adverb (how, where)
        '$':None, #  dollar sign ($)
        '#':None, # pound sign (#)
        '“':None, # left quote (‘ or “)
        '”':None, # right quote (’ or ”)
        '(':None, # left parenthesis ([, (, {, <)
        ')':None, # right parenthesis (], ), }, >)
        ',':None, # comma (,)
        '.':None, # sentence-final punc (. ! ?)
        ':':None # mid-sentence punc (: ; ... – -)
    }
#创建树库和WordNet之间的映射
从nltk.corpus导入wordnet作为wn
#WordNet词性标签是:名词='n',形容词='s',动词='v',ADV='r',形容词SAT='a'
#说明(c)https://web.stanford.edu/~jurafsky/slp3/10.pdf
标记映射={
“CC”:无、#协调连词(and、but、or)
“CD”:wn.NOUN,#基数(一,二)
“DT”:无,#限定词(a,the)
'EX':wn.ADV,#存在的'there'(there)
“FW”:无,#外来词(我有罪)
“IN”:wn.ADV,#介词/子连词(of、IN、by)
JJ:[wn.ADJ,wn.ADJ_-SAT],#形容词(黄色)
JJR:[wn.ADJ,wn.ADJ_-SAT],#ADJ.,比较级(较大)
JJS:[wn.ADJ,wn.ADJ_SAT],#ADJ.,最高级(最狂野的)
“LS”:无,#列表项标记(1、2、1)
“MD”:无,#模态(可以,应该)
“NN”:wn.NOUN、#NOUN、sing.或mass(骆驼)
“NNS”:wn.名词,#名词,复数(骆驼)
“NNP”:wn.noon,#专有名词,sing.(IBM)
“NNPS”:wn.NOUN,#专有名词,复数(卡罗莱纳州)
“PDT”:[wn.ADJ,wn.ADJ_-SAT],#预定义者(全部,两者)
“POS”:无,#所有格结尾('s)
“PRP”:无,#人称代词(我、你、他)
“PRP$”:无,#所有格代词(你的,某人的)
“RB”:wn.ADV,#副词(快速,从不)
“RBR”:wn.ADV,#副词,比较级(更快)
“RBS”:wn.ADV,#副词,最高级(最快)
“RP”:[wn.ADJ,wn.ADJ_-SAT],#粒子(向上,关闭)
“SYM”:无,#符号(+、%、&)
“TO”:无,#“TO”(TO)
“嗯”:无,#叹词(啊,哎呀)
“VB”:wn.VERB,#动词基本形式(eat)
“VBD”:wn.VERB,#动词过去时(ate)
“VBG”:wn.VERB,#动词动名词(eating)
“VBN”:wn.VERB,#动词过去分词(eat)
“VBP”:wn.VERB,#动词非3sg pres(eat)
“VBZ”:wn.VERB,#动词3sg pres(eats)
“WDT”:无,#wh限定词(which,that)
“WP”:无,#wh代词(what,who)
“WP$”:无,#所有格(wh-谁的)
“WRB”:无,#wh副词(how,where)
“$”:无,#美元符号($)
“#”:无,#磅符号(#)
“”:无,#左引号(“或”)
“”:无,#右引号(“或”)
“(”:无,#左括号([,(,{,)
“,”:无,#逗号(,)
“.”:无,#句末punc(.!?)
“:”:无#半句双关语(:;…–-)
}

我将此理解为“热气腾腾的词性”,这里也提供了一些提示,附属形容词呢?附属形容词与常规形容词一样处理。[penn_标记]从何而来?树库标记列表?并没有涵盖所有基础,因为有更多的词性标记