Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 如何将tensorboard与aws sagemaker tensorflow配合使用?_Amazon Web Services_Tensorflow2.0_Tensorboard_Amazon Sagemaker - Fatal编程技术网

Amazon web services 如何将tensorboard与aws sagemaker tensorflow配合使用?

Amazon web services 如何将tensorboard与aws sagemaker tensorflow配合使用?,amazon-web-services,tensorflow2.0,tensorboard,amazon-sagemaker,Amazon Web Services,Tensorflow2.0,Tensorboard,Amazon Sagemaker,我开始了一份sagemaker的工作: from sagemaker.tensorflow import TensorFlow mytraining= TensorFlow(entry_point='model.py', role=role, train_instance_count=1, train_instance_type='ml.p2.xlarg

我开始了一份sagemaker的工作:

from sagemaker.tensorflow import TensorFlow
mytraining= TensorFlow(entry_point='model.py',
                        role=role,
                        train_instance_count=1,
                        train_instance_type='ml.p2.xlarge',
                        framework_version='2.0.0',
                        py_version='py3',
                        distributions={'parameter_server'{'enabled':False}})

training_data_uri ='s3://path/to/my/data'
mytraining.fit(training_data_uri,run_tensorboard_locally=True)
使用
run\u tesorboard\u locally=True
给了我

Tensorboard is not supported with script mode. You can run the following command: tensorboard --logdir None --host localhost --port 6006 This can be run from anywhere with access to the S3 URI used as the logdir.
似乎我不能使用它的脚本模式,但我可以访问s3中tensorboard的日志?但是s3中的日志在哪里

def _parse_args():
    parser = argparse.ArgumentParser()

    # Data, model, and output directories
    # model_dir is always passed in from SageMaker. By default this is a S3 path under the default bucket.
    parser.add_argument('--model_dir', type=str)
    parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAINING'))
    parser.add_argument('--hosts', type=list, default=json.loads(os.environ.get('SM_HOSTS')))
    parser.add_argument('--current-host', type=str, default=os.environ.get('SM_CURRENT_HOST'))

    return parser.parse_known_args()

if __name__ == "__main__":
    args, unknown = _parse_args()

    train_data, train_labels = load_training_data(args.train)
    eval_data, eval_labels = load_testing_data(args.train)

    mymodel= model(train_data, train_labels, eval_data, eval_labels)

    if args.current_host == args.hosts[0]:
        mymodel.save(os.path.join(args.sm_model_dir, '000000002/model.h5'))
类似的问题如下:

编辑我尝试了这个新配置,但它不工作

 tensorboard_output_config = TensorBoardOutputConfig( s3_output_path='s3://PATH/to/my/bucket')

mytraining= TensorFlow(entry_point='model.py',
                        role=role,
                        train_instance_count=1,
                        train_instance_type='ml.p2.xlarge',
                        framework_version='2.0.0',
                        py_version='py3',
                        distributions={'parameter_server': {'enabled':False}},
                        tensorboard_output_config=tensorboard_output_config)
我在model.py脚本中添加了回调,这实际上是我在没有sagemaker的情况下使用的。作为logdir,我定义了默认的dir,TensoboardOutputConfig在其中写入数据。。但它不起作用。我也在没有回调的情况下使用了它

 tensorboardCallback = tf.keras.callbacks.TensorBoard(
        log_dir='/opt/ml/output/tensorboard',
        histogram_freq=0,
        # batch_size=32,ignored tf.2.0
        write_graph=True,
        write_grads=False,
        write_images=False,
        embeddings_freq=0,
        embeddings_layer_names=None,
        embeddings_metadata=None,
        embeddings_data=None,
        update_freq='batch') 

在您的情况下,很难调试确切的根本原因是什么,但以下步骤对我很有效。我手动启动笔记本实例中的tensorboard

  • 按照上的指南为张力板日志配置
    S3
    输出路径

    from sagemaker.debugger import TensorBoardOutputConfig
    
    tensorboard_output_config = TensorBoardOutputConfig(
           s3_output_path = 's3://bucket-name/tensorboard_log_folder/'
    )
    
    estimator = TensorFlow(entry_point='train.py',
                   source_dir='./',
                   model_dir=model_dir,
                   output_path= output_dir,
                   train_instance_type=train_instance_type,
                   train_instance_count=1,
                   hyperparameters=hyperparameters,
                   role=sagemaker.get_execution_role(),
                   base_job_name='Testing-TrainingJob',
                   framework_version='2.2',
                   py_version='py37',
                   script_mode=True,
                   tensorboard_output_config=tensorboard_output_config)
    
    estimator.fit(inputs)
    
  • 通过笔记本实例上的终端,使用上面提供的
    S3
    位置启动张力板

    $ tensorboard --logdir 's3://bucket-name/tensorboard_log_folder/'
    
  • 使用
    /proxy/6006/
    通过URL访问董事会。您需要更新以下URL中的笔记本实例详细信息

    https://myinstance.notebook.us-east-1.sagemaker.aws/proxy/6006/