Python 如何在sagemaker自定义部署端点脚本中加载文件
我试图在sagemaker上部署一个情绪分析模型,将其部署到一个端点,以实时预测输入文本的情绪。该模型将单个文本字符串作为输入,并返回情感 为了训练xgboost模型,我按照这个步骤一直到步骤23。 这将model.tar.gz上载到s3 bucket。另外,我还将sklearn的CountVectorizer生成的词汇表(创建单词包)上传到s3存储桶中 为了部署这个预先训练好的模型,我可以使用并提供一个入口点python文件predict.pyPython 如何在sagemaker自定义部署端点脚本中加载文件,python,amazon-web-services,scikit-learn,aws-sdk,amazon-sagemaker,Python,Amazon Web Services,Scikit Learn,Aws Sdk,Amazon Sagemaker,我试图在sagemaker上部署一个情绪分析模型,将其部署到一个端点,以实时预测输入文本的情绪。该模型将单个文本字符串作为输入,并返回情感 为了训练xgboost模型,我按照这个步骤一直到步骤23。 这将model.tar.gz上载到s3 bucket。另外,我还将sklearn的CountVectorizer生成的词汇表(创建单词包)上传到s3存储桶中 为了部署这个预先训练好的模型,我可以使用并提供一个入口点python文件predict.py sklearn_model = SKLearnM
sklearn_model = SKLearnModel(model_data="s3://bucket/model.tar.gz", role="SageMakerRole", entry_point="predict.py")
import os
import re
import pickle
import numpy as np
import pandas as pd
import nltk
nltk.download("stopwords")
from nltk.corpus import stopwords
from nltk.stem.porter import *
from bs4 import BeautifulSoup
import sagemaker_containers
from sklearn.feature_extraction.text import CountVectorizer
def model_fn(model_dir):
#TODO How to load the word_dict.
#TODO How to load the model.
return model, word_dict
def predict_fn(input_data, model):
print('Inferring sentiment of input data.')
trained_model, word_dict = model
if word_dict is None:
raise Exception('Model has not been loaded properly, no word_dict.')
#Process input_data so that it is ready to be sent to our model.
input_bow_csv = process_input_text(word_dict, input_data)
prediction = trained_model.predict(input_bow_csv)
return prediction
def process_input_text(word_dict, input_data):
words = text_to_words(input_data);
vectorizer = CountVectorizer(preprocessor=lambda x: x, tokenizer=lambda x: x, word_dict)
bow_array = vectorizer.transform([words]).toarray()[0]
bow_csv = ",".join(str(bit) for bit in bow_array)
return bow_csv
def text_to_words(text):
"""
Uses the Porter Stemmer to stem words in a review
"""
#instantiate stemmer
stemmer = PorterStemmer()
text_nohtml = BeautifulSoup(text, "html.parser").get_text() # Remove HTML tags
text_lower = re.sub(r"[^a-zA-Z0-9]", " ", text_nohtml.lower()) # Convert to lower case
words = text_lower.split() # Split string into words
words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
words = [PorterStemmer().stem(w) for w in words] # stem
return words
def input_fn(input_data, content_type):
return input_data;
def output_fn(prediction_output, accept):
return prediction_output;
文档说明我必须只提供model.tar.gz作为参数,它将被加载到model_fn中。但是,如果我正在编写自己的模型,那么如何加载该模型呢?如果我将其他文件放在与S3中model.tar.gz相同的目录中,是否也可以加载它们
现在要进行分类,我必须在调用方法predict\u fn中的model.predict(bow\u vector)之前对输入文本进行向量化。为了做到这一点,我需要word_dict,它是我在预处理训练数据时准备的,并写入s3
我的问题是如何在模型fn中获得单词?我可以从s3加载它吗?
下面是predict.py的代码
sklearn_model = SKLearnModel(model_data="s3://bucket/model.tar.gz", role="SageMakerRole", entry_point="predict.py")
import os
import re
import pickle
import numpy as np
import pandas as pd
import nltk
nltk.download("stopwords")
from nltk.corpus import stopwords
from nltk.stem.porter import *
from bs4 import BeautifulSoup
import sagemaker_containers
from sklearn.feature_extraction.text import CountVectorizer
def model_fn(model_dir):
#TODO How to load the word_dict.
#TODO How to load the model.
return model, word_dict
def predict_fn(input_data, model):
print('Inferring sentiment of input data.')
trained_model, word_dict = model
if word_dict is None:
raise Exception('Model has not been loaded properly, no word_dict.')
#Process input_data so that it is ready to be sent to our model.
input_bow_csv = process_input_text(word_dict, input_data)
prediction = trained_model.predict(input_bow_csv)
return prediction
def process_input_text(word_dict, input_data):
words = text_to_words(input_data);
vectorizer = CountVectorizer(preprocessor=lambda x: x, tokenizer=lambda x: x, word_dict)
bow_array = vectorizer.transform([words]).toarray()[0]
bow_csv = ",".join(str(bit) for bit in bow_array)
return bow_csv
def text_to_words(text):
"""
Uses the Porter Stemmer to stem words in a review
"""
#instantiate stemmer
stemmer = PorterStemmer()
text_nohtml = BeautifulSoup(text, "html.parser").get_text() # Remove HTML tags
text_lower = re.sub(r"[^a-zA-Z0-9]", " ", text_nohtml.lower()) # Convert to lower case
words = text_lower.split() # Split string into words
words = [w for w in words if w not in stopwords.words("english")] # Remove stopwords
words = [PorterStemmer().stem(w) for w in words] # stem
return words
def input_fn(input_data, content_type):
return input_data;
def output_fn(prediction_output, accept):
return prediction_output;