Python 保存并加载scikit学习机器学习模型和功能_Python_Serialization_Scikit Learn_Pickle_Text Mining

Python 保存并加载scikit学习机器学习模型和功能

python serialization scikit-learn

Python 保存并加载scikit学习机器学习模型和功能,python,serialization,scikit-learn,pickle,text-mining,Python,Serialization,Scikit Learn,Pickle,Text Mining,我用scikit训练Naive Bayes模型，学习如何在我的web应用程序中对文章进行分类。为了避免重复学习模型，我希望保存模型并在以后将其部署到应用程序中。当我搜索这个问题时，很多人推荐pickle库我有这个模型： import pickle import os def custom_tokenizer (doc) : tokens = vect_tokenizer(doc) return [lemmatizer.lemmatize(token) for token in

我用scikit训练Naive Bayes模型，学习如何在我的web应用程序中对文章进行分类。为了避免重复学习模型，我希望保存模型并在以后将其部署到应用程序中。当我搜索这个问题时，很多人推荐

pickle

库

我有这个模型：

import pickle
import os
def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

tfidf = TfidfVectorizer(tokenizer = custom_tokenizer,stop_words = "english")
clf = MultinomialNB()

我已经执行了

tfidf.fit_transform（）

并培训了

clf

。最后，我得到了一个模型，并使用以下代码保存了

clf

分类器：

dest = os.path.join('classifier','pkl_object')
f = open(os.path.join(dest,'classifier.pkl'),'wb')
pickle.dump(best_classifier,f,protocol = 4)
f.close()

我还尝试用这种方式将矢量器保存为文件

f =  open(os.path.join(dest,'vect.pkl'),'wb')
pickle.dump(custom_tokenizer,f,protocol = 4)
pickle.dump(best_vector,f,protocol = 4)
f.close()

没有错误。但是当我试图加载文件时，出现了这个错误消息

import pickle
import os

with open(os.path.join('pkl_object','classifier.pkl'),'rb') as file :
    clf = pickle.load(file)

with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
    vect = pickle.load(file)

错误消息：

AttributeError                            Traceback (most recent call last)
<ipython-input-55-d4b562870a02> in <module>()
     11 
     12 with open(os.path.join('pkl_vect','vect.pkl'),'rb') as file:
---> 13     vect = pickle.load(file)
     14 
     15 '''

AttributeError: Can't get attribute 'custom_tokenizer' on <module '__main__'>

AttributeError回溯（最近一次调用）
在（）
11
12以文件形式打开（os.path.join（'pkl_vect'，'vect.pkl'），'rb'）：
--->13 vect=pickle.load（文件）
14
15 '''
AttributeError:无法在上获取属性“custom_tokenizer”

我认为

pickle

库没有正确存储函数的能力。如何将自定义的

TfidfVectorizer

序列化为文件。

在第二个程序中还包括：

def custom_tokenizer (doc) :
    tokens = vect_tokenizer(doc)
    return [lemmatizer.lemmatize(token) for token in tokens]

因为pickle实际上并不存储关于类/对象是如何构造的信息，正如错误日志中的这一行所说的

AttributeError:Can't get attribute'custom\u tokenizer'on

它不知道什么是

custom\u tokenizer

。请参阅以获得更好的理解。

这是在同一台计算机上的吗？如果不是，请验证两台计算机上的

sklearn

版本是否相同。@pault这两台计算机在同一台计算机上。在加载pickle的文件中，是否定义了自定义标记器？需要定义函数才能正确加载pickle，在您的情况下，它也需要在全局范围内。