Tensorflow服务-使用tf.contrib.learn.experience培训的模型的“无可服务版本”消息

Tensorflow服务-使用tf.contrib.learn.experience培训的模型的“无可服务版本”消息,tensorflow,tensorflow-serving,google-cloud-ml,google-cloud-ml-engine,Tensorflow,Tensorflow Serving,Google Cloud Ml,Google Cloud Ml Engine,我已经使用Google Cloud ML Engine的入门教程作为参考培训了一个模型。我可以毫无问题地在Google Cloud ML上部署和提供此模型 现在,我尝试使用Tensorflow服务为其提供服务,但我得到以下错误消息: 2017-03-17 19:20:17.064146: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:204] No versions of servable

我已经使用Google Cloud ML Engine的入门教程作为参考培训了一个模型。我可以毫无问题地在Google Cloud ML上部署和提供此模型

现在,我尝试使用Tensorflow服务为其提供服务,但我得到以下错误消息:

2017-03-17 19:20:17.064146: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:204] No versions of servable default found under base path /serving/tf_models/extrato/output/
我用来启动de Tensorflow服务的命令行调用是:

root@df98954689a1:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_base_path=/serving/tf_models/extrato/output/
输出文件夹的内容为:

root@df98954689a1:/serving# ls -la tf_models/extrato/output
total 119740
drwxr-xr-x 4 root root     4096 Mar 17 17:02 .
drwxr-xr-x 3 root root     4096 Mar 17 17:02 ..
-rw-r--r-- 1 root root      184 Mar 17 17:02 checkpoint
drwxr-xr-x 2 root root     4096 Mar 17 17:02 eval
-rw-r--r-- 1 root root 96390060 Mar 17 17:02 events.out.tfevents.1489705843.elio-MS-7A66
drwxr-xr-x 3 root root     4096 Mar 17 17:02 export
-rw-r--r-- 1 root root  1362798 Mar 17 17:02 graph.pbtxt
-rw-r--r-- 1 root root  7633781 Mar 17 17:02 model.ckpt-1000001.data-00000-of-00001
-rw-r--r-- 1 root root     1975 Mar 17 17:02 model.ckpt-1000001.index
-rw-r--r-- 1 root root   637623 Mar 17 17:02 model.ckpt-1000001.meta
-rw-r--r-- 1 root root  7633781 Mar 17 17:02 model.ckpt-2.data-00000-of-00001
-rw-r--r-- 1 root root     1975 Mar 17 17:02 model.ckpt-2.index
-rw-r--r-- 1 root root   637623 Mar 17 17:02 model.ckpt-2.meta
-rw-r--r-- 1 root root  7633781 Mar 17 17:02 model.ckpt-566170.data-00000-of-00001
-rw-r--r-- 1 root root     1975 Mar 17 17:02 model.ckpt-566170.index
-rw-r--r-- 1 root root   637623 Mar 17 17:02 model.ckpt-566170.meta              
更新:我尝试使用冻结的model.pb文件和variables文件夹,这确实是我用来在Google Cloud ML引擎上部署模型的文件夹,但收到了相同的错误消息

这些文件位于下面的文件夹中:

root@d4f1c917b59d:/serving# ls -la tf_models/extrato/output/export/Servo/1489706933289/
total 356
drwxr-xr-x 3 root root   4096 Mar 17 17:02 .
drwxr-xr-x 3 root root   4096 Mar 17 17:02 ..
-rw-r--r-- 1 root root 348848 Mar 17 17:02 saved_model.pb
drwxr-xr-x 2 root root   4096 Mar 17 17:02 variables
我用来训练和导出模型的代码是:

import argparse

import model

import tensorflow as tf
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.contrib.learn.python.learn.utils import (
    saved_model_export_utils)


def generate_experiment_fn(train_files,
                           eval_files,
                           num_epochs=None,
                           train_batch_size=40,
                           eval_batch_size=40,
                           embedding_size=8,
                           first_layer_size=100,
                           num_layers=4,
                           scale_factor=0.7,
                           **experiment_args):
  """Create an experiment function given hyperparameters.

  See command line help text for description of args.
  Returns:
    A function (output_dir) -> Experiment where output_dir is a string
    representing the location of summaries, checkpoints, and exports.
    this function is used by learn_runner to create an Experiment which
    executes model code provided in the form of an Estimator and
    input functions.

    All listed arguments in the outer function are used to create an
    Estimator, and input functions (training, evaluation, serving).
    Unlisted args are passed through to Experiment.
  """
  # Check verbose logging flag
  verbose_logging = experiment_args.pop('verbose_logging')
  model.set_verbose_logging(verbose_logging)

  def _experiment_fn(output_dir):
    # num_epochs can control duration if train_steps isn't
    # passed to Experiment
    train_input = model.generate_input_fn(
        train_files,
        num_epochs=num_epochs,
        batch_size=train_batch_size,
    )
    # Don't shuffle evaluation data
    eval_input = model.generate_input_fn(
        eval_files,
        batch_size=eval_batch_size,
        shuffle=False
    )
    return tf.contrib.learn.Experiment(
        model.build_estimator(
            output_dir,
            embedding_size=embedding_size,
            # Construct layers sizes with exponetial decay
            hidden_units=[
                max(2, int(first_layer_size * scale_factor**i))
                for i in range(num_layers)
            ]
        ),
        train_input_fn=train_input,
        eval_input_fn=eval_input,
        # export strategies control the prediction graph structure
        # of exported binaries.
        export_strategies=[saved_model_export_utils.make_export_strategy(
            model.serving_input_fn,
            default_output_alternative_key=None,
            exports_to_keep=1
        )],
        **experiment_args
    )
  return _experiment_fn


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # Input Arguments
  parser.add_argument(
      '--train-files',
      help='GCS or local paths to training data',
      nargs='+',
      required=True
  )
  parser.add_argument(
      '--num-epochs',
      help="""\
      Maximum number of training data epochs on which to train.
      If both --max-steps and --num-epochs are specified,
      the training job will run for --max-steps or --num-epochs,
      whichever occurs first. If unspecified will run for --max-steps.\
      """,
      type=int,
  )
  parser.add_argument(
      '--train-batch-size',
      help='Batch size for training steps',
      type=int,
      default=40
  )
  parser.add_argument(
      '--eval-batch-size',
      help='Batch size for evaluation steps',
      type=int,
      default=40
  )
  parser.add_argument(
      '--train-steps',
      help="""\
      Steps to run the training job for. If --num-epochs is not specified,
      this must be. Otherwise the training job will run indefinitely.\
      """,
      type=int
  )
  parser.add_argument(
      '--eval-steps',
      help='Number of steps to run evalution for at each checkpoint',
      default=100,
      type=int
  )
  parser.add_argument(
      '--eval-files',
      help='GCS or local paths to evaluation data',
      nargs='+',
      required=True
  )
  # Training arguments
  parser.add_argument(
      '--embedding-size',
      help='Number of embedding dimensions for categorical columns',
      default=8,
      type=int
  )
  parser.add_argument(
      '--first-layer-size',
      help='Number of nodes in the first layer of the DNN',
      default=100,
      type=int
  )
  parser.add_argument(
      '--num-layers',
      help='Number of layers in the DNN',
      default=4,
      type=int
  )
  parser.add_argument(
      '--scale-factor',
      help='How quickly should the size of the layers in the DNN decay',
      default=0.7,
      type=float
  )
  parser.add_argument(
      '--job-dir',
      help='GCS location to write checkpoints and export models',
      required=True
  )

  # Argument to turn on all logging
  parser.add_argument(
      '--verbose-logging',
      default=False,
      type=bool,
      help='Switch to turn on or off verbose logging and warnings'
  )

  # Experiment arguments
  parser.add_argument(
      '--eval-delay-secs',
      help='How long to wait before running first evaluation',
      default=10,
      type=int
  )
  parser.add_argument(
      '--min-eval-frequency',
      help='Minimum number of training steps between evaluations',
      default=1,
      type=int
  )

  args = parser.parse_args()
  arguments = args.__dict__
  job_dir = arguments.pop('job_dir')

  print('Starting Census: Please lauch tensorboard to see results: tensorboard --logdir=$MODEL_DIR')

  # Run the training job
  # learn_runner pulls configuration information from environment
  # variables using tf.learn.RunConfig and uses this configuration
  # to conditionally execute Experiment, or param server code
  learn_runner.run(generate_experiment_fn(**arguments), job_dir)
有人知道我做错了什么吗


致以最良好的祝愿

TensorFlow服务希望您指向包含版本子目录的基本目录。在您的情况下,Servo是您想要指向的目录,1489706933289是版本的目录

以下方面应起作用:

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
  --port=9000 \
  --model_base_path=/serving/tf_models/extrato/output/Servo
请注意,向基本路径添加了伺服,并且没有1489706933289


请注意,在CloudML中,您直接部署了一个版本,因此您需要指向GCS上类似于gs://my_bucket/../tf_models/extrato/output/Servo/1489706933289的子目录。我还尝试为位于导出文件夹中的冻结的model.pb文件和variables文件夹提供服务。但我仍然没有成功,同样的错误信息正在显示。我已经用这些信息更新了这个问题。你能澄清一下你是否尝试了-model_base_path=/service/tf_models/extrato/output/Servo[注意最后一个子目录]。是的,我尝试了.pb文件和变量目录的完整路径。在我的例子中,它是-model\u base\u path=/service/tf\u models/extrato/output/export/Servo/1489706933289/我想不太清楚,但我想你想要-model\u base\u path=/service/tf\u models/extrato/output/ServoOk,我要试试。