Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/facebook/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud dataflow 云数据流自定义模板创建问题_Google Cloud Dataflow - Fatal编程技术网

Google cloud dataflow 云数据流自定义模板创建问题

Google cloud dataflow 云数据流自定义模板创建问题,google-cloud-dataflow,Google Cloud Dataflow,我正在尝试为云数据流作业创建一个模板,该模板从云存储读取json文件并写入大查询。我正在传递2个运行时参数:1。地面军事系统位置2的输入文件。BigQuery的数据集和表Id JsonTextToBqTemplate代码: public class JsonTextToBqTemplate { private static final Logger logger = LoggerFactory.getLogger(TextToBQTemplate.class);

我正在尝试为云数据流作业创建一个模板,该模板从云存储读取json文件并写入大查询。我正在传递2个运行时参数:1。地面军事系统位置2的输入文件。BigQuery的数据集和表Id

JsonTextToBqTemplate代码:

 public class JsonTextToBqTemplate {

    private static final Logger logger = 
    LoggerFactory.getLogger(TextToBQTemplate.class);

    private static Gson gson = new GsonBuilder().create();

    public static void main(String[] args) throws Exception {

        JsonToBQTemplateOptions options = 
        PipelineOptionsFactory.fromArgs(args).withValidation()
                .as(JsonToBQTemplateOptions.class);

        String jobName = options.getJobName();

        try {
            logger.info("PIPELINE-INFO: jobName={} message={} ", 
            jobName, "starting pipeline creation");
            Pipeline pipeline = Pipeline.create(options);
            pipeline.apply("ReadLines", TextIO.read().from(options.getInputFile()))

                    .apply("Converting to TableRows", ParDo.of(new DoFn<String, TableRow>() {
                        private static final long serialVersionUID = 0;

                        @ProcessElement
                        public void processElement(ProcessContext c) {
                            String json = c.element();
                            TableRow tableRow = gson.fromJson(json, TableRow.class);
                            c.output(tableRow);
                        }
                    }))
            .apply(BigQueryIO.writeTableRows().to(options.getTableSpec())
                    .withCreateDisposition(CreateDisposition.CREATE_NEVER)
                    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

            logger.info("PIPELINE-INFO: jobName={} message={} ", jobName, "pipeline started");
            State state = pipeline.run().waitUntilFinish();
            logger.info("PIPELINE-INFO: jobName={} message={} ", jobName, "pipeline status" + state);

        } catch (Exception exception) {
            throw exception;
        }
    }
 }
错误:

Caused by: java.lang.IllegalStateException: Cannot estimate size of a FileBasedSource with inaccessible file pattern: {}. [RuntimeValueProvider{propertyName=inputFile, default=null, value=null}]
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:518)
at org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:199)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:207)
at org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
at org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)
当我传递inputFile和tableSpec的值时,Mvn构建成功,如下所示

 mvn -X compile exec:java \
-Dexec.mainClass=com.ihm.adp.pipeline.template.JsonTextToBqTemplate \
-Dexec.args="--project=xxxxxx-123456 \
--stagingLocation=gs://xxx-test/template/staging/jsontobq/ \
--tempLocation=gs://xxx-test/temp/ \
--templateLocation=gs://xxx-test/template/templates/jsontobq \
--inputFile=gs://xxx-test/input/bqtest.json \
--tableSpec=xxx_test.jsontobq_test \
--errorOutput=gs://xxx-test/template/output"
但它不会在云数据流中创建任何模板


有没有一种方法可以在maven执行期间创建模板而不验证这些运行时参数?

我认为这里的问题是您没有指定运行程序。默认情况下,这是尝试使用DirectRunner。设法通过

--runner=TemplatingDataflowPipelineRunner 
作为
-Dexec.args
的一部分。在此之后,您也不需要指定诸如inputFile等ValueProvider模板参数

更多信息请点击此处:


如果您使用的是Dataflow SDK 1.x版,则需要指定以下参数:

--runner=TemplatingDataflowPipelineRunner
--dataflowJobFile=gs://xxx-test/template/templates/jsontobq/
--runner=DataflowRunner
--templateLocation=gs://xxx-test/template/templates/jsontobq/

如果您使用的是Dataflow SDK版本2.x(Apache Beam),则需要指定以下参数:

--runner=TemplatingDataflowPipelineRunner
--dataflowJobFile=gs://xxx-test/template/templates/jsontobq/
--runner=DataflowRunner
--templateLocation=gs://xxx-test/template/templates/jsontobq/
看起来您使用的是Dataflow SDK版本2.x,而没有为
runner
参数指定
DataflowRunner


参考资料:

谢谢安德鲁!模板创建成功,工作正常。我的Mvn构建仍然失败,出现以下错误。原因:org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries上的java.lang.NullPointerException(DataflowPipelineJob.java:489)。似乎它正在尝试运行管道作业,但由于空引用而失败。您好@prasad您是如何运行创建命令的?从项目主文件夹执行mvn编译命令时,我收到NoClassDefFoundError。