Python 根据pos nlp对字符串进行柠檬化_Python_Nlp_Nltk_Lemmatization

Python 根据pos nlp对字符串进行柠檬化

python nlp

Python 根据pos nlp对字符串进行柠檬化,python,nlp,nltk,lemmatization,Python,Nlp,Nltk,Lemmatization,我试图根据词性对字符串进行柠檬化，但在最后阶段，我遇到了一个错误。我的代码： import nltk from nltk.stem import * from nltk.tokenize import sent_tokenize, word_tokenize from nltk.corpus import wordnet wordnet_lemmatizer = WordNetLemmatizer() text = word_tokenize('People who help the bling

我试图根据词性对字符串进行柠檬化，但在最后阶段，我遇到了一个错误。我的代码：

import nltk
from nltk.stem import *
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import wordnet
wordnet_lemmatizer = WordNetLemmatizer()
text = word_tokenize('People who help the blinging lights are the way of the future and are heading properly to their goals')
tagged = nltk.pos_tag(text)

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return ''

for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-afb22c78f770> in <module>()
----> 1 for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
     38 
     39     def lemmatize(self, word, pos=NOUN):
---> 40         lemmas = wordnet._morphy(word, pos)
     41         return min(lemmas, key=len) if lemmas else word
     42 

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in _morphy(self, form, pos)
   1710 
   1711         # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1712         forms = apply_rules([form])
   1713 
   1714         # 2. Return all that are in the database (and check the original too)

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms)
   1690         def apply_rules(forms):
   1691             return [form[:-len(old)] + new
-> 1692                     for form in forms
   1693                     for old, new in substitutions
   1694                     if form.endswith(old)]

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0)
   1692                     for form in forms
   1693                     for old, new in substitutions
-> 1694                     if form.endswith(old)]
   1695 
   1696         def filter_forms(forms):

导入nltk
从nltk.stem导入*
从nltk.tokenize导入发送\u tokenize，单词\u tokenize
从nltk.corpus导入wordnet
wordnet_lemmatizer=WordNetLemmatizer（）
text=word_tokenize（'帮助闪烁的灯光的人是未来的道路，并且正正确地朝着他们的目标前进'）
tagged=nltk.pos_标签（文本）
def get_wordnet_位置（树库标签）：
如果树库标签以（'J'）开头：
返回wordnet.ADJ
elif treebank_tag.startswith（'V'）：
返回wordnet.VERB
elif treebank_tag.startswith（'N'）：
返回wordnet.NOUN
elif treebank_tag.startswith（'R'）：
返回wordnet.ADV
其他：
返回“”
对于标签中的单词：print（wordnet_lemmatizer.lemmatize（word，pos='v'），end=”“）
---------------------------------------------------------------------------
AttributeError回溯（最近一次呼叫上次）
在（）
---->1对于带标签的单词：print（wordnet_lemmatizer.lemmatize（word，pos='v'），end=”“）
lemmatize（self、word、pos）中的E:\Miniconda3\envs\uol1\lib\site packages\nltk\stem\wordnet.py
38
39 def lemmatize（self，word，pos=名词）：
--->40引理=wordnet.\u变形（单词，词性）
41如果引理为其他单词，则返回min（引理，key=len）
42
E:\Miniconda3\envs\uol1\lib\site packages\nltk\corpus\reader\wordnet.py in\u morphy（self、form、pos）
1710
1711         # 1. 对输入应用一次规则以获得y1、y2、y3等。
->1712表格=应用规则（[表格]）
1713
1714         # 2. 返回数据库中的所有内容（并检查原始内容）
应用规则（表格）中的E:\Miniconda3\envs\uol1\lib\site packages\nltk\corpus\reader\wordnet.py
1690 def应用规则（表格）：
1691申报表[表格[：-len（旧）]+新
->1692表格中的表格
1693年旧的，新的替代品
1694如果格式为endswith（旧）]
E:\Miniconda3\envs\uol1\lib\site packages\nltk\corpus\reader\wordnet.py in（.0）
1692表格中的表格
1693年旧的，新的替代品
->1694如果格式为endswith（旧）]
1695
1696 def过滤器表格（表格）：

我希望能够根据每个单词的词性同时对字符串进行柠檬化。请提供帮助。

首先，不要将顶级导入、绝对导入和相对导入混用如下：

import nltk
from nltk.stem import *
from nltk import pos_tag, word_tokenize

这样会更好：

from nltk import sent_tokenize, word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet as wn

（见附件）

您收到的错误很可能是因为您输入了

pos\u标记的输出作为WordNetLemmatizer.lemmatize（）
的输入，即：
>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer

>>> wnl = WordNetLemmatizer()
>>> sent = 'People who help the blinging lights are the way of the future and are heading properly to their goals'.split()

>>> pos_tag(sent)
[('People', 'NNS'), ('who', 'WP'), ('help', 'VBP'), ('the', 'DT'), ('blinging', 'NN'), ('lights', 'NNS'), ('are', 'VBP'), ('the', 'DT'), ('way', 'NN'), ('of', 'IN'), ('the', 'DT'), ('future', 'NN'), ('and', 'CC'), ('are', 'VBP'), ('heading', 'VBG'), ('properly', 'RB'), ('to', 'TO'), ('their', 'PRP$'), ('goals', 'NNS')]
>>> pos_tag(sent)[0]
('People', 'NNS')

>>> first_word = pos_tag(sent)[0]
>>> wnl.lemmatize(first_word)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/stem/wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1712, in _morphy
    forms = apply_rules([form])
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1694, in apply_rules
    if form.endswith(old)]
AttributeError: 'tuple' object has no attribute 'endswith'


或者，如果您喜欢简单的解决方法：
pip install pywsd

然后：
我不太理解你的方法：你想在检查单词的词性以确保你得到正确的引理后，将单词语法化，是吗？如果是，你能给出一个预期的输入和输出吗？另外，get\u wordnet\u pos（）的意义是什么？我看不到它在任何地方都被使用过。请看
pip install pywsd

>>> from pywsd.utils import lemmatize, lemmatize_sentence
>>> sent = 'People who help the blinging lights are the way of the future and are heading properly to their goals'
>>> lemmatize_sentence(sent)
['people', 'who', 'help', 'the', u'bling', u'light', u'be', 'the', 'way', 'of', 'the', 'future', 'and', u'be', u'head', 'properly', 'to', 'their', u'goal']