如何使用Python SDK创建具有生存时间的Dataproc集群_Python_Protocol Buffers_Google Cloud Dataproc

如何使用Python SDK创建具有生存时间的Dataproc集群

python protocol-buffers

如何使用Python SDK创建具有生存时间的Dataproc集群,python,protocol-buffers,google-cloud-dataproc,Python,Protocol Buffers,Google Cloud Dataproc,我尝试使用python SDK创建一个Dataproc集群，它的生存时间为1天。为此，Dataproc API的v1beta2引入了作为ClusterConfig对象的子对象的我在传递给create\u cluster方法的JSON文件中使用这个对象。为了设置特定的TTL，我使用字段auto_delete\u TTL，该字段的值应为86400秒（一天）关于如何在JSON文件中表示持续时间：持续时间应表示为带后缀s的字符串，表示为秒，小数点应为0,3,6或9秒：但是，如果使用此格式传递持续

我尝试使用python SDK创建一个Dataproc集群，它的生存时间为1天。为此，Dataproc API的v1beta2引入了作为ClusterConfig对象的子对象的

我在传递给

create\u cluster

方法的JSON文件中使用这个对象。为了设置特定的TTL，我使用字段

auto_delete\u TTL

，该字段的值应为86400秒（一天）

关于如何在JSON文件中表示持续时间：持续时间应表示为带后缀s的字符串，表示为秒，小数点应为0,3,6或9秒：

但是，如果使用此格式传递持续时间，则会出现以下错误：

MergeFrom（）的参数必须是同一类的实例：应为google.protobuf.Duration

以下是我创建集群的方式：

from google.cloud import dataproc_v1beta2
project = "your_project_id"
region = "europe-west4"
cluster = "" #see below for cluster JSON file
client = dataproc_v1beta2.ClusterControllerClient(client_options={
    'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
})

# Create the cluster
operation = client.create_cluster(project, region, cluster)

变量cluster包含描述所需集群的JSON对象：

{
  "cluster_name":"my_cluster",
  "config":{
     "config_bucket":"my_conf_bucket",
     "gce_cluster_config":{
        "zone_uri":"europe-west4-a",
        "metadata":{
           "PIP_PACKAGES":"google-cloud-storage google-cloud-bigquery"
        },
        "subnetwork_uri":"my subnet",
        "service_account_scopes":[
           "https://www.googleapis.com/auth/cloud-platform"
        ],
        "tags":[
           "some tags"
        ]
     },
     "master_config":{
        "num_instances":1,
        "machine_type_uri":"n1-highmem-4",
        "disk_config":{
           "boot_disk_type":"pd-standard",
           "boot_disk_size_gb":200,
           "num_local_ssds":0
        },
        "accelerators":[

        ]
     },
     "software_config":{
        "image_version":"1.4-debian9",
        "properties":{
           "dataproc:dataproc.allow.zero.workers":"true",
           "yarn:yarn.log-aggregation-enable":"true",
           "dataproc:dataproc.logging.stackdriver.job.driver.enable":"true",
           "dataproc:dataproc.logging.stackdriver.enable":"true",
           "dataproc:jobs.file-backed-output.enable":"true"
        },
        "optional_components":[

        ]
     },
     "lifecycle_config":{
        "auto_delete_ttl":"86400s"
     },
     "initialization_actions":[
        {
           "executable_file":"gs://some-init-script"
        }
     ]
  },
  "project_id":"project_id"
  }

我正在使用的软件包版本：

谷歌云数据处理程序：0.6.1
协议：3.11.3
googleapis通用原型：1.6.0

我在这里是否做错了什么，是因为包版本错误还是错误？

当您以文本格式（即json等）构建protobuf时，应该使用

100s

格式作为持续时间类型，但您使用的是Python对象来构建API请求体，这就是为什么需要创建字符串而不是字符串：

duration\u消息。从秒开始（86400）

非常感谢您指出这一点。我甚至没有意识到这一点。顺便说一下，我通常更喜欢将配置与源代码分开。因此，尽管上面有示例代码，实际上我正在从一个包含所需集群配置的bucket读取一个JSON文件。我能够使用以下几行代码将JSON解析为Python对象：import google.protobuf.JSON_format as JSON_format import google.cloud.dataproc_v1beta2.proto.clusters_pb2 as clusters clusters cluster_message=JSON_format.parse（cluster_JSON，clusters.cluster（））然后我将cluster_消息对象传递给ClusterControllerClient提供的create_cluster方法。