Google cloud dataflow 使用--experiments=upload_图形获取Dataflowrunner_Google Cloud Dataflow_Apache Beam

Google cloud dataflow 使用--experiments=upload_图形获取Dataflowrunner

google-cloud-dataflow

Google cloud dataflow 使用--experiments=upload_图形获取Dataflowrunner,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,我有一个管道，它生成的数据流图（序列化JSON表示）超出了API允许的限制，因此无法像通常那样通过apache beam的dataflow runner启动。使用指示的参数--experiments=upload\u graph运行dataflow runner不起作用，并且由于没有指定步骤而失败当通过错误获得有关此尺寸问题的通知时，将提供以下信息： the size of the serialized JSON representation of the pipeline exceeds t

我有一个管道，它生成的数据流图（序列化JSON表示）超出了API允许的限制，因此无法像通常那样通过apache beam的dataflow runner启动。使用指示的参数

--experiments=upload\u graph

运行dataflow runner不起作用，并且由于没有指定步骤而失败

当通过错误获得有关此尺寸问题的通知时，将提供以下信息：

the size of the serialized JSON representation of the pipeline exceeds the allowable limit for the API. 

Use experiment 'upload_graph' (--experiments=upload_graph)
to direct the runner to upload the JSON to your 
GCS staging bucket instead of embedding in the API request.

现在使用此参数确实会导致dataflow runner将一个额外的

dataflow_graph.pb

文件上载到通常的pipeline.pb文件旁边的暂存位置。我验证了它实际上存在于gcp存储中

但是，gcp数据流中的作业在启动后立即失败，出现以下错误：

Runnable workflow has no steps specified.

我已经在各种管道上尝试过这个标志，甚至ApacheBeam示例管道，并且看到了相同的行为

这可以通过使用字数计数来重现，例如：

mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.11.0 \
      -DgroupId=org.example \
      -DartifactId=word-count-beam \
      -Dversion="0.1" \
      -Dpackage=org.apache.beam.examples \
      -DinteractiveMode=false

在不使用

experiments=upload\u graph

参数的情况下运行它：（如果要运行此操作，请确保指定项目和存储桶）

现在，我希望dataflow runner会指示gcp dataflow从源代码中指定的bucket读取步骤：

然而，情况似乎并非如此。有没有人能做到这一点，或者找到了一些关于此功能的文档，可以为我指明正确的方向？

实验已经恢复，消息将在Beam 2.13.0中更正

Revert

我最近遇到了这个问题，解决方案非常愚蠢。我开发了一个相当复杂的数据流作业，它工作正常，第二天停止工作，出现错误“Runnable workflow没有指定步骤”。在我的例子中，有人在创建选项后两次指定了

pipeline（）.run（）.waitUntilFinish（）

，因此，我得到了这个错误。删除重复的管道运行解决了该问题。我仍然认为beam/dataflowrunner在这种情况下应该有一些有用的错误跟踪。

不幸的是，数据流上还不支持这个实验标志

--experiments=upload\u graph

。参考文献

cd word-count-beam/

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
                  --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
     -Pdataflow-runner

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
                  --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
                  --experiments=upload_graph \
                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
     -Pdataflow-runner