Google cloud dataflow 使用云数据流通过窗口从PubSub写入Google云存储

Google cloud dataflow 使用云数据流通过窗口从PubSub写入Google云存储,google-cloud-dataflow,Google Cloud Dataflow,我在流模式下通过pubsub接收到数据流消息(这是我所需要的)。 每条信息应存储在GCS中自己的文件中。 由于TextIO.Write中不支持无界集合,我尝试将PCollection划分为每个包含一个元素的窗口。 并将每个窗口写入谷歌云存储 这是我的密码: public static void main(String[] args) { DataflowPipelineOptions options = PipelineOptionsFactory.create()

我在流模式下通过pubsub接收到数据流消息(这是我所需要的)。 每条信息应存储在GCS中自己的文件中。 由于TextIO.Write中不支持无界集合,我尝试将PCollection划分为每个包含一个元素的窗口。 并将每个窗口写入谷歌云存储

这是我的密码:

public static void main(String[] args) {    

          DataflowPipelineOptions options = PipelineOptionsFactory.create()
                  .as(DataflowPipelineOptions.class);
                options.setRunner(BlockingDataflowPipelineRunner.class);                
                options.setProject(PROJECT_ID);             
                options.setStagingLocation(STAGING_LOCATION);
                options.setStreaming(true);
                Pipeline pipeline = Pipeline.create(options);

                PubsubIO.Read.Bound<String> readFromPubsub = PubsubIO.Read.named("ReadFromPubsub")
                        .subscription(SUBSCRIPTION);

                PCollection<String> streamData = pipeline.apply(readFromPubsub);        



                PCollection<String> windowedMessage = streamData.apply(Window.<String>triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1))).discardingFiredPanes());
            e


                windowedMessage.apply(TextIO.Write.to("gs://pubsub-outputs/1"));

                pipeline.run();
        }

执行上述操作的代码是什么

TextIO使用绑定的PCollection,您可以使用API存储写入GCS

你可以做:

    PipeOptions options = data.getPipeline().getOptions().as(PipeOptions.class);
    data.apply(WithKeys.of(new SerializableFunction<String, String>() {
             public String apply(String s) { return "mykey"; } }))          

    .apply(Window.<KV<String, String>>into(FixedWindows.of(Duration.standardMinutes(options.getTimeWrite()))))
    .apply(GroupByKey.create())
    .apply(Values.<Iterable<String>>create())
    .apply(ParDo.of(new StorageWrite(options)));
PipeOptions=data.getPipeline().getOptions().as(PipeOptions.class);
data.apply(with key.of)(新的SerializableFunction(){
公共字符串应用(字符串s{return“mykey”;}})
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(options.getTimeWrite())))
.apply(GroupByKey.create())
.apply(value.create())
.apply(新存储写入(选项));
您可以使用groupBy操作创建一个窗口,并可以使用iterable将其写入存储器。StorageWrite的processElement:

        PipeOptions options = c.getPipelineOptions().as(PipeOptions.class);
        String date = ISODateTimeFormat.date().print(c.window().maxTimestamp());
        String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
        String blobName = String.format("%s/%s/%s", options.getBucketRepository(), date, options.getFileOutName() + isoDate);

        BlobId blobId = BlobId.of(options.getGCSBucket(), blobName);

        WriteChannel writer = storage.writer(BlobInfo.builder(blobId).contentType("text/plain").build());

        for (Iterator<String> it = c.element().iterator(); it.hasNext();) {
            writer.write(ByteBuffer.wrap(it.next().getBytes()));
        }
        writer.close();  
PipeOptions=c.getPipelineOptions().as(PipeOptions.class);
字符串date=ISODateTimeFormat.date().print(c.window().maxTimestamp());
字符串isoDate=ISODATETIMETFORMAT.dateTime().print(c.window().maxTimestamp());
String blobName=String.format(“%s/%s/%s”,options.getBucketRepository(),date,options.getFileOutName()+isoDate);
BlobId BlobId=BlobId.of(options.getGCSBucket(),blobName);
WriteChannel writer=storage.writer(BlobInfo.builder(blobId.contentType(“text/plain”).build());
for(Iterator it=c.element().Iterator();it.hasNext();){
write(ByteBuffer.wrap(it.next().getBytes());
}
writer.close();

如果答案有效,您能否接受答案?
        PipeOptions options = c.getPipelineOptions().as(PipeOptions.class);
        String date = ISODateTimeFormat.date().print(c.window().maxTimestamp());
        String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
        String blobName = String.format("%s/%s/%s", options.getBucketRepository(), date, options.getFileOutName() + isoDate);

        BlobId blobId = BlobId.of(options.getGCSBucket(), blobName);

        WriteChannel writer = storage.writer(BlobInfo.builder(blobId).contentType("text/plain").build());

        for (Iterator<String> it = c.element().iterator(); it.hasNext();) {
            writer.write(ByteBuffer.wrap(it.next().getBytes()));
        }
        writer.close();