Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform GCP Dataproc并行步骤执行_Google Cloud Platform_Workflow_Google Cloud Dataproc - Fatal编程技术网

Google cloud platform GCP Dataproc并行步骤执行

Google cloud platform GCP Dataproc并行步骤执行,google-cloud-platform,workflow,google-cloud-dataproc,Google Cloud Platform,Workflow,Google Cloud Dataproc,我正在使用来自YAML文件的工作流模板在GCP上创建dataproc集群。创建集群后,所有步骤开始并行执行,但我希望在所有其他步骤完成执行后执行一些步骤。有没有办法做到这一点 用于创建集群的示例YAML jobs: - pigJob: continueOnFailure: true queryList: queries: - sh /ui.sh stepId: run-pig-ui - pigJob: continueOnFailure: tr

我正在使用来自YAML文件的工作流模板在GCP上创建dataproc集群。创建集群后,所有步骤开始并行执行,但我希望在所有其他步骤完成执行后执行一些步骤。有没有办法做到这一点

用于创建集群的示例YAML

jobs:
- pigJob:
    continueOnFailure: true
    queryList:
      queries:
      - sh /ui.sh
  stepId: run-pig-ui
- pigJob:
    continueOnFailure: true
    queryList:
      queries:
      - sh /hotel.sh
  stepId: run-pig-hotel

placement:
  managedCluster:
    clusterName: cluster-abc
    labels:
      data: cluster
    config:
      configBucket: bucket-1
      initializationActions:
        - executableFile: gs://bucket-1/install_git.sh
          executionTimeout: 600s
      gceClusterConfig:
        zoneUri: asia-south1-a
        tags:
          - test
      masterConfig:
        machineTypeUri: n1-standard-8
        diskConfig:
          bootDiskSizeGb: 50
      workerConfig:
        machineTypeUri: n1-highcpu-32
        numInstances: 2
        diskConfig:
          bootDiskSizeGb: 100
      softwareConfig:
        imageVersion: 1.4-ubuntu18
        properties:
          core:io.compression.codec.lzo.class: com.hadoop.compression.lzo.LzoCodec
          core:io.compression.codecs: org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
      secondaryWorkerConfig:
        numInstances: 2
        isPreemptible: true

用于创建群集的命令

gcloud dataproc workflow-templates instantiate-from-file --file file_name.yaml

gcloud版本:261.0.0

您可以在最后一个工作流步骤中使用
先决条件PID
列表,以确保它仅在所有先决条件步骤运行后运行。您可以在中看到预期的结构

jobs:
- pigJob:
    continueOnFailure: true
    queryList:
      queries:
      - sh /ui.sh
  stepId: run-pig-ui
- pigJob:
    continueOnFailure: true
    queryList:
      queries:
      - sh /hotel.sh
  stepId: run-pig-hotel
- pigJob:
    continueOnFailure: true
    queryList:
      queries:
      - sh /final.sh
  stepId: run-final-step
  prerequisiteStepIds:
    - run-pig-ui
    - run-pig-hotel
...