Python AWS SageMaker PyTorch：没有名为'；sagemaker'；_Python_Amazon Web Services_Amazon Sagemaker

Python AWS SageMaker PyTorch：没有名为'；sagemaker'；

python amazon-web-services

Python AWS SageMaker PyTorch：没有名为'；sagemaker'；,python,amazon-web-services,amazon-sagemaker,Python,Amazon Web Services,Amazon Sagemaker,我已经使用SageMaker在AWS上部署了一个PyTorch模型，并尝试发送一个测试服务的请求。然而，我收到一条非常模糊的错误消息，上面说“没有名为‘sagemaker’的模块”。我试图在网上搜索，但找不到类似的帖子我的客户代码： import numpy as np from sagemaker.pytorch.model import PyTorchPredictor ENDPOINT = '<endpoint name>' predictor = PyTorchPred

我已经使用SageMaker在AWS上部署了一个PyTorch模型，并尝试发送一个测试服务的请求。然而，我收到一条非常模糊的错误消息，上面说“没有名为‘sagemaker’的模块”。我试图在网上搜索，但找不到类似的帖子

我的客户代码：

import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor

ENDPOINT = '<endpoint name>'

predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())

将numpy导入为np
从sagemaker.pytorch.model导入PyTorchPredictor
端点=“”
预测器=PyTorchPredictor（端点）
predictor.predict（np.random.random_样本（[1,3,224,224]）。tobytes（））

详细错误消息：

Traceback (most recent call last):
  File "client.py", line 7, in <module>
    predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.

回溯（最近一次呼叫最后一次）：
文件“client.py”，第7行，在
predictor.predict（np.random.random_样本（[1,3,224,224]）。tobytes（））
predict中的文件“/Users/jiashenc/Env/py3/lib/python3.7/site packages/sagemaker/predictor.py”，第110行
response=self.sagemaker\u session.sagemaker\u runtime\u client.invoke\u endpoint（**请求参数）
文件“/Users/jiashenc/Env/py3/lib/python3.7/site packages/botocore/client.py”，第276行，在api调用中
返回self.\u make\u api\u调用（操作名称，kwargs）
文件“/Users/jiashenc/Env/py3/lib/python3.7/site packages/botocore/client.py”，第586行，在make\u api\u调用中
引发错误\u类（解析的\u响应、操作\u名称）
botocore.errorfactory.ModelError:调用InvokeEndpoint操作时发生错误（ModelError）：从模型接收到服务器错误（500），消息为“没有名为“sagemaker”的模块”。看见https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/in account xxxxxxxxxxxxxxxx了解更多信息。

此错误是因为我将服务脚本和部署脚本合并在一起，请参见下文

import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50


def model_fn(model_dir):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model = resnet50()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f, map_location=device))
    return model.to(device)

def predict_fn(input_data, model):
    device = torch.device('cuda' if cuda.is_available() else 'cpu')
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))


if __name__ == '__main__':
    pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
                                    entry_point='serve.py', role='jiashenC-sagemaker',
                                    py_version='py3', framework_version='1.3.1')
    predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
    print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

导入操作系统
进口火炬
将numpy作为np导入
从sagemaker.pytorch.model导入PyTorchModel
从火炬进口cuda
从torchvision.models导入resnet50
def型号fn（型号dir）：
device=torch.device（'cuda'如果cuda.is\u可用（），则为'cpu'）
型号=resnet50（）
将open（os.path.join（model_dir，'model.pth'），'rb'）作为f:
模型负载状态（火炬负载（f，地图位置=设备））
返回模型到（设备）
def predict_fn（输入_数据，模型）：
device=torch.device（'cuda'如果cuda.is\u可用（），则为'cpu'）
model.eval（）
使用手电筒。无梯度（）
返回模型（输入数据到（设备））
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
pytorch\u model=PyTorchModel（model\u data='s3:///resnet50/model.tar.gz'，
入口点='serve.py'，角色='jiashenC-sagemaker'，
py_version='py3'，framework_version='1.3.1'）
predictor=pytorch\u model.deploy（instance\u type='ml.t2.medium'，initial\u instance\u count=1）
打印（predictor.predict（np.random.random_-sample（[1,3,224,224]）.astype（np.float32）））

根本原因是代码中的第四行。它尝试导入sagemaker，这是一个不可用的库

（使用额外的代码片段编辑2020年2月9日）

您的服务代码尝试在内部使用

sagemaker

模块。sagemaker的

sagemaker

模块（也称为sagemaker众多编排SDK之一）不是设计用于模型容器中，而是设计用于模型外，以编排其活动（训练、部署、贝叶斯调优等）。在您的特定示例中，您不应该将部署和模型调用代码包含到服务器代码中，因为这些实际上是将从服务器外部执行的操作，以协调其生命周期并与之交互。对于使用Sagemaker Pytorch容器的模型部署，您的入口点脚本只需要包含用于模型反序列化的所需

model\u fn

函数，以及可选的

input\u fn

、

predict\u fn

和

output\u fn

，分别用于预处理、推断和后处理（）。这个逻辑很好：）：部署一个生产就绪的深度学习服务器不需要任何其他东西！（对于Pytork和MXNet，为MMS；对于sklearn，为Flask+Gunicorn）

总之，以下是代码的拆分方式：

包含模型服务代码的入口点脚本

service.py

，如下所示：

import os

import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50

def model_fn(model_dir):
    # TODO instantiate a model from its artifact stored in model_dir
    return model

def predict_fn(input_data, model):
    # TODO apply model to the input_data, return result of interest
    return result

和一些业务流程代码来实例化SageMaker模型对象，将其部署到服务器并查询它。这是从您选择的编排运行时运行的，可以是SageMaker笔记本电脑、您的笔记本电脑、AWS Lambda函数、Apache Airflow操作符等，并带有SDK供您选择；不需要为此使用python

import numpy as np
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data='s3://<bucket name>/resnet50/model.tar.gz',
    entry_point='serve.py',
    role='jiashenC-sagemaker',
    py_version='py3',
    framework_version='1.3.1')

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

将numpy导入为np
从sagemaker.pytorch.model导入PyTorchModel
pytorch_模型=PyTorchModel(
model_data='s3:///resnet50/model.tar.gz'，
入口点='serve.py'，
role='jiashenC-sagemaker'，
py_version='py3'，
框架（1.3.1版）
predictor=pytorch\u model.deploy（instance\u type='ml.t2.medium'，initial\u instance\u count=1）
打印（predictor.predict（np.random.random_-sample（[1,3,224,224]）.astype（np.float32）））

您的代码似乎试图在内部使用

sagemaker

模块。您是否在型号代码中使用了

sagemaker

库？它不应该用在模型中，而是用在模型外，来协调它们的活动（训练、部署、贝叶斯调优等）@Olivier_Cruchant谢谢，这正是我面临的问题。您想将此作为答案发布吗？我将向上投票并添加更多信息。非常感谢！不要犹豫问更多的问题。我也编辑了我的问题以提供更多信息。我的示例显示了错误的确切原因。是的，基本上您需要从其他地方运行部署和模型调用，例如sagemaker笔记本电脑、笔记本电脑或连接到云的任何地方。