Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform 如何在apache beam中展平窗口化的PCollection?[云数据流]_Google Cloud Platform_Google Cloud Datastore_Google Cloud Dataflow_Google Cloud Pubsub - Fatal编程技术网

Google cloud platform 如何在apache beam中展平窗口化的PCollection?[云数据流]

Google cloud platform 如何在apache beam中展平窗口化的PCollection?[云数据流],google-cloud-platform,google-cloud-datastore,google-cloud-dataflow,google-cloud-pubsub,Google Cloud Platform,Google Cloud Datastore,Google Cloud Dataflow,Google Cloud Pubsub,我尝试使用数据流将数据从pubsub流到数据存储。 我搜索了谷歌提供的模板。 请注意,数据存储不起作用。 所以,我试着调试这个。 这就是我所做的 添加错误标签 添加窗口处理(pubsub生成无界数据,数据存储不能接受无界数据) 添加flatte(将窗口数据写入数据存储的方法是none。因此,我认为是unwindowed) 这是我的密码 package com.google.cloud.teleport.templates; import com.google.cloud

我尝试使用数据流将数据从pubsub流到数据存储。 我搜索了谷歌提供的模板。

请注意,数据存储不起作用。 所以,我试着调试这个。

这就是我所做的

  • 添加错误标签
  • 添加窗口处理(pubsub生成无界数据,数据存储不能接受无界数据)
  • 添加flatte(将窗口数据写入数据存储的方法是none。因此,我认为是unwindowed)
这是我的密码

    package com.google.cloud.teleport.templates;

    import com.google.cloud.teleport.templates.common.DatastoreConverters.DatastoreWriteOptions;
    import com.google.cloud.teleport.templates.common.DatastoreConverters.WriteJsonEntities;
    import com.google.cloud.teleport.templates.common.JavascriptTextTransformer.JavascriptTextTransformerOptions;
    import com.google.cloud.teleport.templates.common.JavascriptTextTransformer.TransformTextViaJavascript;
    import com.google.cloud.teleport.templates.common.PubsubConverters.PubsubReadOptions;
    import org.apache.beam.sdk.Pipeline;
    import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
    import org.apache.beam.sdk.options.PipelineOptions;
    import org.apache.beam.sdk.options.PipelineOptionsFactory;

    // added for errorTag
    import com.google.cloud.teleport.templates.common.ErrorConverters.ErrorWriteOptions;
    import com.google.cloud.teleport.templates.common.ErrorConverters.LogErrors;
    import org.apache.beam.sdk.values.TupleTag;

    // added for window
    import org.apache.beam.sdk.transforms.windowing.FixedWindows;
    import org.apache.beam.sdk.transforms.windowing.Window;
    import org.apache.beam.sdk.transforms.Flatten;
    import org.apache.beam.sdk.values.PCollection;
    import org.apache.beam.sdk.values.PCollectionList;
    import org.apache.beam.sdk.values.PCollectionTuple;

    import org.joda.time.Duration;

    public class PubsubToDatastore {
      interface PubsubToDatastoreOptions extends
          PipelineOptions,
          PubsubReadOptions,
          JavascriptTextTransformerOptions,
          DatastoreWriteOptions,
          ErrorWriteOptions {} // added

      public static void main(String[] args) {
        PubsubToDatastoreOptions options = PipelineOptionsFactory
            .fromArgs(args)
            .withValidation()
            .as(PubsubToDatastoreOptions.class);

        TupleTag<String> errorTag = new TupleTag<String>("errors"){};

        Pipeline pipeline = Pipeline.create(options);

        pipeline
            .apply("Read Pubsub Events", PubsubIO.readStrings().fromTopic(options.getPubsubReadTopic()))
            .apply("Windowing", Window.into(FixedWindows.of(Duration.standardMinutes(5))))
            .apply("Flatten", Flatten.pCollections())
            .apply("Transform text to json", TransformTextViaJavascript.newBuilder()
                .setFileSystemPath(options.getJavascriptTextTransformGcsPath())
                .setFunctionName(options.getJavascriptTextTransformFunctionName())
                .build())
            .apply(WriteJsonEntities.newBuilder()
                .setProjectId(options.getDatastoreWriteProjectId())
                .setErrorTag(errorTag)
                .build())
            .apply(LogErrors.newBuilder()
                .setErrorWritePath(options.getErrorWritePath())
                .setErrorTag(errorTag)
                .build());

        pipeline.run();
      }
    } 
package com.google.cloud.teleport.templates;
导入com.google.cloud.teleport.templates.common.DatastoreConverters.DatastoreWriteOptions;
导入com.google.cloud.teleport.templates.common.DatastoreConverters.WriteJsonEntities;
导入com.google.cloud.teleport.templates.common.JavascriptTextTransformer.JavascriptTextTransformer选项;
导入com.google.cloud.teleport.templates.common.JavascriptTextTransformer.TransformTextViaJavascript;
导入com.google.cloud.teleport.templates.common.PubsubConverters.PubsubReadOptions;
导入org.apache.beam.sdk.Pipeline;
导入org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
导入org.apache.beam.sdk.options.PipelineOptions;
导入org.apache.beam.sdk.options.pipelineoptions工厂;
//添加了错误标签
导入com.google.cloud.teleport.templates.common.ErrorConverters.ErrorWriteOptions;
导入com.google.cloud.teleport.templates.common.ErrorConverters.LogErrors;
导入org.apache.beam.sdk.values.TupleTag;
//为窗口添加
导入org.apache.beam.sdk.transforms.windowing.FixedWindows;
导入org.apache.beam.sdk.transforms.windowing.Window;
导入org.apache.beam.sdk.transforms.flant;
导入org.apache.beam.sdk.values.PCollection;
导入org.apache.beam.sdk.values.PCollectionList;
导入org.apache.beam.sdk.values.PCollectionTuple;
导入org.joda.time.Duration;
公共类数据存储{
数据存储选项扩展的接口
管道选项,
PubsubReadOptions,
JavascriptTextTransformerOptions,
数据存储写入选项,
ErrorWriteOptions{}//已添加
公共静态void main(字符串[]args){
PubsubToDatastoreOptions=PipelineOptions工厂
.fromArgs(args)
.withValidation()
.as(数据存储选项.class);
TupleTag errorTag=新的TupleTag(“错误”){};
Pipeline=Pipeline.create(选项);
管道
.apply(“读取Pubsub事件”,PubsubIO.readStrings().fromTopic(options.getPubsubReadTopic()))
.apply(“Windowing”,Window.into(fixed windows.of(Duration.standardMinutes(5)))
.apply(“展平”,展平.pCollections())
.apply(“将文本转换为json”,TransformTextViaJavascript.newBuilder()
.setfilesystemspath(options.getJavascriptTextTransformGcsPath())
.setFunctionName(options.getJavascriptTextTransformFunctionName())
.build())
.apply(WriteJsonEntities.newBuilder()
.setProjectId(options.getDatastoreWriteProjectId())
.setErrorTag(errorTag)
.build())
.apply(LogErrors.newBuilder()
.setErrorWritePath(options.getErrorWritePath())
.setErrorTag(errorTag)
.build());
pipeline.run();
}
} 
当我运行此代码时,出现了错误

    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 11.054 s
    [INFO] Finished at: 2018-08-20T17:55:49+09:00
    [INFO] ------------------------------------------------------------------------
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile (default-compile) on project google-cloud-teleport-java: Compilation failure
    [ERROR] /Users/shinya.yaginuma/work/DataflowTemplates/src/main/java/com/google/cloud/teleport/templates/PubsubToDatastore.java:[80,9] can not find an appropriate method for apply(java.lang.String,org.apache.beam.sdk.transforms.Flatten.PCollections<java.lang.Object>)
    [ERROR]     method org.apache.beam.sdk.values.PCollection.<OutputT>apply(org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<java.lang.String>,OutputT>) can't use
    [ERROR]       (Unable to infer the type variable OutputT
    [ERROR]         (The actual argument list and dummy argument list have different lengths))
    [ERROR]     method org.apache.beam.sdk.values.PCollection.<OutputT>apply(java.lang.String,org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<java.lang.String>,OutputT>) can't use
    [ERROR]       (Since there is no instance of type variable T, org.apache.beam.sdk.transforms.Flatten.PCollections is not fit for  org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<java.lang.String>,OutputT>)
[INFO]生成失败
[信息]------------------------------------------------------------------------
[信息]总时间:11.054秒
[信息]完成时间:2018-08-20T17:55:49+09:00
[信息]------------------------------------------------------------------------
[错误]无法在google cloud teleport项目上执行目标org.apache.maven.plugins:maven编译器plugin:3.6.2:compile(默认编译):编译失败
[错误]/Users/shinya.yaginuma/work/DataflowTemplates/src/main/java/com/google/cloud/teleport/templates/pubsetodatastore.java:[80,9]找不到合适的应用方法(java.lang.String,org.apache.beam.sdk.transforms.flatte.PCollections)

[错误]方法org.apache.beam.sdk.values.PCollection.apply(org.apache.beam.sdk.transforms.PTransform不确定为什么要在窗口打开后展平集合。它猜测展平操作并没有真正做到您认为的效果

下面是它所说的:

返回一个{@link pttransform},它将{@link PCollectionList}展平为{@link PCollection},其中包含其输入中所有{@link PCollection}的所有元素

展平将多个PCollection捆绑到一个PCollectionList中,并返回一个包含所有输入PCollection中所有元素的PCollection。名称“展平”表示将列表列表展平到一个列表中

例如,如果您有来自不同来源的多个PCollection,并且希望将其“展平”到同一个PCollection中,则展平是您的工具。在这种情况下,您只有一个PCollection(而不是PCollectionList,即PCollection列表)因此展平操作对您没有任何好处。第一步是从
publisubio.readStrings()
窗口
Window.into(…)
获取
PCollection
,然后从第一个无界
PCollection
获取有界
PCollection


我建议您只需删除
.apply(“flatte”,flatte.pCollections())
行,然后再次运行管道。否则看起来很好。

不确定为什么要在窗口打开后展平集合。它猜测展平操作实际上并没有达到您认为的效果

下面是它所说的:

返回一个{@link pttransform},它将{@link PCollectionList}展平为包含所有