Python “柠檬化”;I';m";至;我";使用nltk
我使用的是nltk的wordnet_语言编辑器。理想情况下,“我”这个词应该被引理化为“我” 我尝试了以下POS标记器:Python “柠檬化”;I';m";至;我";使用nltk,python,nltk,wordnet,Python,Nltk,Wordnet,我使用的是nltk的wordnet_语言编辑器。理想情况下,“我”这个词应该被引理化为“我” 我尝试了以下POS标记器: wordnet_lemmatizer.lemmatize("I'm", wordnet.ADV) wordnet_lemmatizer.lemmatize("I'm", wordnet.ADJ) wordnet_lemmatizer.lemmatize("I'm", wordnet.VERB) wordnet_lemmatizer.lemmatize("I'm", wordn
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADV)
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADJ)
wordnet_lemmatizer.lemmatize("I'm", wordnet.VERB)
wordnet_lemmatizer.lemmatize("I'm", wordnet.NOUN)enter code here
它们都返回“我是”而不是“我”,
知道我可能缺少什么吗?首先标记和POS标记,然后将标记用作
WordNetLemmatizer.lemmatize()的POS
参数输入。
看看
>>> from nltk import pos_tag, word_tokenize
>>> from nltk.stem import WordNetLemmatizer
>>>
>>> wnl = WordNetLemmatizer()
>>>
>>> def penn2morphy(penntag):
... """ Converts Penn Treebank tags to WordNet"""
... morphy_tag = {'NN':'n', 'JJ':'a',
... 'VB':'v', 'RB':'r'}
... try:
... return morphy_tag[penntag[:2]]
... except:
... return 'n' # default to Nouns.
...
...
>>> def lemmatize_sent(tokenized_sent):
... return [wnl.lemmatize(word.lower(), penn2morphy(tag)) for word, tag in pos_tag(tokenized_sent)]
...
>>> lemmatize_sent("I'm")
['i', "'", 'm']