Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python GCP AI平台:创建自定义预测模型版本时出错(训练模型Pytorch model+;torchvision.transform)_Python_Google Cloud Platform_Pytorch_Torchvision_Google Ai Platform - Fatal编程技术网

Python GCP AI平台:创建自定义预测模型版本时出错(训练模型Pytorch model+;torchvision.transform)

Python GCP AI平台:创建自定义预测模型版本时出错(训练模型Pytorch model+;torchvision.transform),python,google-cloud-platform,pytorch,torchvision,google-ai-platform,Python,Google Cloud Platform,Pytorch,Torchvision,Google Ai Platform,我目前正试图通过遵循将自定义模型部署到AI平台。它是基于来自'Pytorch'和“torchvision.transform”的预先训练模型的组合。目前,我一直得到以下错误,这恰好与自定义预测上的500MB限制有关 错误:(gcloud.beta.ai platform.versions.create)创建版本失败。检测到错误的模型:模型需要的内存超过允许的内存。请尝试减小模型大小并重新部署。如果您仍然遇到错误,请与支持部门联系。 Setup.py from setuptools import

我目前正试图通过遵循将自定义模型部署到AI平台。它是基于来自'Pytorch'和“torchvision.transform”的预先训练模型的组合。目前,我一直得到以下错误,这恰好与自定义预测上的500MB限制有关

错误:(gcloud.beta.ai platform.versions.create)创建版本失败。检测到错误的模型:模型需要的内存超过允许的内存。请尝试减小模型大小并重新部署。如果您仍然遇到错误,请与支持部门联系。

Setup.py

from setuptools import setup
from pathlib import Path

base = Path(__file__).parent
REQUIRED_PACKAGES = [line.strip() for line in open(base/"requirements.txt")]
print(f"\nPackages: {REQUIRED_PACKAGES}\n\n")

# [torch==1.3.0,torchvision==0.4.1, ImageHash==4.2.0
# Pillow==6.2.1,pyvis==0.1.8.2] installs 800mb worth of files

setup(description="Extract features of a image",
      author='Amrit',
      name='test',
      version='0.1',
      install_requires=REQUIRED_PACKAGES,
      project_urls={
                    'Documentation':'https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines#tensorflow',
                    'Deploy':'https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud_1',
                    'Ai_platform troubleshooting':'https://cloud.google.com/ai-platform/training/docs/troubleshooting',
                    'Say Thanks!': 'https://medium.com/searce/deploy-your-own-custom-model-on-gcps-ai-platform- 
 7e42a5721b43',
                    'google Torch wheels':"http://storage.googleapis.com/cloud-ai-pytorch/readme.txt",
                    'Torch & torchvision wheels':"https://download.pytorch.org/whl/torch_stable.html "
                    },
    python_requires='~=3.7',
    scripts=['predictor.py', 'preproc.py'])
采取的步骤: 尝试将“torch”和torchvision直接添加到setup.py文件中的“REQUIRED_PACKAGES”列表中,以便在部署时提供Pytork+torchvision作为要安装的依赖项。我猜,在内部Ai平台下载PyTorch的PyPI包是+500 MB,这导致我们的模型部署失败。如果我只使用“torch”部署模型,并且它似乎正在工作(当然,由于找不到库“torchvision”,会抛出错误)

文件大小

  • pytorchtorch-1.3.1+cpu-cp37-cp37m-linux\u x86\u 64.whl关于111MB
  • torchvisiontorchvision-0.4.1+cpu-cp37-cp37m-linux\u x86\u 64.whl关于46MB)并存储在GKS上
  • 压缩的预测器模型文件(.tar.gz格式),它是setup.py(5kb)的输出
  • 经过训练的Pytork模型(大小44MB
总的来说,模型依赖项应该小于250MB,但仍然会出现此错误。他们也尝试使用谷歌镜像软件包提供的torch和torchvision,但同样的内存问题依然存在。人工智能平台对我们来说是一个全新的平台,需要专业人士的一些意见

更多信息: GCP CLI输入:

BUCKET_NAME= “something”
MODEL_DIR="gs://$BUCKET_NAME/"
VERSION_NAME='v6'
MODEL_NAME="something_model"
STAGING_BUCKET=$MODEL_DIR"staging_area/"
# TORCH_PACKAGE=$MODEL_DIR"package/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
# TORCHVISION_PACKAGE=$MODEL_DIR"package/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCH_PACKAGE="gs://cloud-ai-pytorch/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCHVISION_PACKAGE="gs://cloud-ai-pytorch/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
CUSTOM_CODE_PATH=$STAGING_BUCKET"imt_ai_predict-0.1.tar.gz"
PREDICTOR_CLASS="predictor.MyPredictor"
REGION='europe-west1'
MACHINE_TYPE='mls1-c4-m2'
 
gcloud beta ai-platform versions create $VERSION_NAME   \
--model=$MODEL_NAME   \
--origin=$MODEL_DIR  \
 --runtime-version=2.3  \
 --python-version=3.7   \
--machine-type=$MACHINE_TYPE  \
 --package-uris=$CUSTOM_CODE_PATH,$TORCH_PACKAGE,$TORCHVISION_PACKAGE   \
--prediction-class=$PREDICTOR_CLASS \ 
 **[1] global**
 [2] asia-east1
 [3] asia-northeast1
 [4] asia-southeast1
 [5] australia-southeast1
 [6] europe-west1
 [7] europe-west2
 [8] europe-west3
 [9] europe-west4
 [10] northamerica-northeast1
 [11] us-central1
 [12] us-east1
 [13] us-east4
 [14] us-west1
 [15] cancel
Please enter your numeric choice:  1
 
To make this the default region, run `gcloud config set ai_platform/region global`.
 
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.                                                                                                                                            
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: **Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.**
我的环境变量:

BUCKET_NAME= “something”
MODEL_DIR="gs://$BUCKET_NAME/"
VERSION_NAME='v6'
MODEL_NAME="something_model"
STAGING_BUCKET=$MODEL_DIR"staging_area/"
# TORCH_PACKAGE=$MODEL_DIR"package/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
# TORCHVISION_PACKAGE=$MODEL_DIR"package/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCH_PACKAGE="gs://cloud-ai-pytorch/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCHVISION_PACKAGE="gs://cloud-ai-pytorch/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
CUSTOM_CODE_PATH=$STAGING_BUCKET"imt_ai_predict-0.1.tar.gz"
PREDICTOR_CLASS="predictor.MyPredictor"
REGION='europe-west1'
MACHINE_TYPE='mls1-c4-m2'
 
gcloud beta ai-platform versions create $VERSION_NAME   \
--model=$MODEL_NAME   \
--origin=$MODEL_DIR  \
 --runtime-version=2.3  \
 --python-version=3.7   \
--machine-type=$MACHINE_TYPE  \
 --package-uris=$CUSTOM_CODE_PATH,$TORCH_PACKAGE,$TORCHVISION_PACKAGE   \
--prediction-class=$PREDICTOR_CLASS \ 
 **[1] global**
 [2] asia-east1
 [3] asia-northeast1
 [4] asia-southeast1
 [5] australia-southeast1
 [6] europe-west1
 [7] europe-west2
 [8] europe-west3
 [9] europe-west4
 [10] northamerica-northeast1
 [11] us-central1
 [12] us-east1
 [13] us-east4
 [14] us-west1
 [15] cancel
Please enter your numeric choice:  1
 
To make this the default region, run `gcloud config set ai_platform/region global`.
 
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.                                                                                                                                            
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: **Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.**
GCP CLI输出:

BUCKET_NAME= “something”
MODEL_DIR="gs://$BUCKET_NAME/"
VERSION_NAME='v6'
MODEL_NAME="something_model"
STAGING_BUCKET=$MODEL_DIR"staging_area/"
# TORCH_PACKAGE=$MODEL_DIR"package/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
# TORCHVISION_PACKAGE=$MODEL_DIR"package/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCH_PACKAGE="gs://cloud-ai-pytorch/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl"
TORCHVISION_PACKAGE="gs://cloud-ai-pytorch/torchvision-0.4.1+cpu-cp37-cp37m-linux_x86_64.whl"
CUSTOM_CODE_PATH=$STAGING_BUCKET"imt_ai_predict-0.1.tar.gz"
PREDICTOR_CLASS="predictor.MyPredictor"
REGION='europe-west1'
MACHINE_TYPE='mls1-c4-m2'
 
gcloud beta ai-platform versions create $VERSION_NAME   \
--model=$MODEL_NAME   \
--origin=$MODEL_DIR  \
 --runtime-version=2.3  \
 --python-version=3.7   \
--machine-type=$MACHINE_TYPE  \
 --package-uris=$CUSTOM_CODE_PATH,$TORCH_PACKAGE,$TORCHVISION_PACKAGE   \
--prediction-class=$PREDICTOR_CLASS \ 
 **[1] global**
 [2] asia-east1
 [3] asia-northeast1
 [4] asia-southeast1
 [5] australia-southeast1
 [6] europe-west1
 [7] europe-west2
 [8] europe-west3
 [9] europe-west4
 [10] northamerica-northeast1
 [11] us-central1
 [12] us-east1
 [13] us-east4
 [14] us-west1
 [15] cancel
Please enter your numeric choice:  1
 
To make this the default region, run `gcloud config set ai_platform/region global`.
 
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.                                                                                                                                            
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: **Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to experience errors, please contact support.**
我的发现: 已经找到了一些人以同样的方式为PyTorch软件包奋斗的文章,并通过在地面军事系统上安装火炬轮使其工作( 7e42a5721b43)。
曾尝试过火炬和火炬视觉的相同方法,但到目前为止运气不佳,等待“cloudml”的回应-feedback@google.comcloudml-feedback@google.com". 关于在AI平台上使用基于torchvision的定制预测器的定制torch的任何帮助都将非常好。

通过几件事的组合解决了这个问题。我坚持使用4gb CPU MlS1机器和自定义预测程序(我建议使用自定义容器而不是自定义预测程序在AI平台预测上部署模型。请查看此示例>。感谢Raj,我将查看自定义容器。您认为没有模型内存限制吗(请尝试使用更大的机器类型mls1-c4-m4重新部署好吗?自定义容器的最大内存限制为2GB,自定义预测例程为500MB。自定义容器部署在n1-*机器类型上。以下是受支持的机器类型>感谢Enrique,将使用“mls1-c4-m4”进行测试).虽然在官方谷歌文档中没有显示该机器类型,但只有m1和m4类型的Mls1具有自定义预测功能。