Python dask上的WordNetLemmatizer.dataframe错误与'；WordNetCorpusReader'；对象没有属性'_LazyCorpusLoader_uuargs'；_Python_Nlp_Nltk_Python 3.6_Dask

Python dask上的WordNetLemmatizer.dataframe错误与'；WordNetCorpusReader'；对象没有属性'_LazyCorpusLoader_uuargs'；

python nlp dask

Python dask上的WordNetLemmatizer.dataframe错误与'；WordNetCorpusReader'；对象没有属性'_LazyCorpusLoader_uuargs'；,python,nlp,nltk,python-3.6,dask,Python,Nlp,Nltk,Python 3.6,Dask,我试图在dask数据帧上进行词干分析 wnl=WordNetLemmatizer（） def lemmatizing（句子）： stemstence=“” 用于句子中的单词。拆分（）： stem=wnl.柠檬化（word）干强度+=干强度 StemContent+=“” stemstence=stemstence.strip（）返回stemstence df['news\u content']=df['news\u content'].apply（词干分析）.compute（）但我得到了

我试图在dask数据帧上进行词干分析

wnl=WordNetLemmatizer（）
def lemmatizing（句子）：
stemstence=“”
用于句子中的单词。拆分（）：
stem=wnl.柠檬化（word）
干强度+=干强度
StemContent+=“”
stemstence=stemstence.strip（）
返回stemstence
df['news\u content']=df['news\u content'].apply（词干分析）.compute（）

但我得到了以下错误：

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

我已经试过推荐的方法了，但没有成功

感谢您的帮助。

这是因为

wordnet

模块被“延迟读取”，尚未评估

使其工作的一个技巧是在Dask数据帧中使用

WordNetLemmatizer（）

之前先使用一次

>>> from nltk.stem import WordNetLemmatizer
>>> import dask.dataframe as dd

>>> df = dd.read_csv('something.csv')
>>> df.head()
                      text  label
0       this is a sentence      1
1  that is a foo bar thing      0


>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('cats') # Use it once first, to "unlazify" wordnet.
'cat'

# Now you can use it with Dask dataframe's .apply() function.
>>> lemmatize_text = lambda sent: [wnl.lemmatize(word) for word in sent.split()]

>>> df['lemmas'] = df['text'].apply(lemmatize_text)
>>> df.head()
                      text  label                          lemmas
0       this is a sentence      1         [this, is, a, sentence]
1  that is a foo bar thing      0  [that, is, a, foo, bar, thing]

或者，您可以尝试

pywsd

：

pip install -U pywsd

然后在代码中：

>>> from pywsd.utils import lemmatize_sentence
Warming up PyWSD (takes ~10 secs)... took 9.131901025772095 secs.

>>> import dask.dataframe as dd

>>> df = dd.read_csv('something.csv')
>>> df.head()
                      text  label
0       this is a sentence      1
1  that is a foo bar thing      0

>>> df['lemmas'] = df['text'].apply(lemmatize_sentence)
>>> df.head()
                      text  label                          lemmas
0       this is a sentence      1         [this, be, a, sentence]
1  that is a foo bar thing      0  [that, be, a, foo, bar, thing]

谢谢，这帮了大忙。问题是，为什么工作流可以工作，并且

.compute（）

出错？这是因为wordnet需要首先评估。与现代python库不同，现代python库中的大多数内容都是预先评估的。在旧时代，当机器资源有限时，许多东西就像发电机一样工作。因此，使用

WordNetLemmatizer（）

一次就会启动要计算的wordnet。“延迟加载”这是一种今天很少被提及的设计模式，因为机器更大，但在机器资源有限的情况下，它可能是有益的（注：如果你在jupyter笔记本电脑中工作，你必须确保在执行cell

wnl.lemmatize（'cats'）

和其他程序之间花时间。否则，您将得到相同的错误。如果代码在一个单元格中，请添加一个sleep语句以等待5秒左右。