Java 使用自定义目录名和基于超出日志时间的文件名将日志文件从pubsub日志复制到bucket

Java 使用自定义目录名和基于超出日志时间的文件名将日志文件从pubsub日志复制到bucket,java,google-cloud-storage,google-cloud-dataflow,google-cloud-pubsub,Java,Google Cloud Storage,Google Cloud Dataflow,Google Cloud Pubsub,我正在解析来自pubsub的日志,目的是将这些日志放在自定义位置的每小时文件中,这同样基于日志时间戳(pubsub日志中的字段) 文件应获取特定时间的所有数据。文件应每小时追加一次。 e、 g.gs://bucket/applog/2017-09-27/application1/app-2017-09-27-11H.log pushFilePColl.apply(Window.into(new FileTextIOWindowFn())) .apply("FileTO to LOG TextIO

我正在解析来自pubsub的日志,目的是将这些日志放在自定义位置的每小时文件中,这同样基于日志时间戳(pubsub日志中的字段)

文件应获取特定时间的所有数据。文件应每小时追加一次。 e、 g.gs://bucket/applog/2017-09-27/application1/app-2017-09-27-11H.log

pushFilePColl.apply(Window.into(new FileTextIOWindowFn())) .apply("FileTO to LOG TextIO", ParDo.of(new TextIOWriteDoFn())) .apply(TextIO.write().to(pipelineOptions.getFileStorage‌​Bucket()).withWindow‌​edWrites() .withFilenamePolicy(new FileStorageFileNamePolicy(logTypeEnum)).withNumShards(10));
自定义窗口:

public class FileTextIOWindowFn extends NonMergingWindowFn<Object, IntervalWindow> {

/**
 * 
 */
private static final long serialVersionUID = 1L;

private IntervalWindow assignWindow(AssignContext context) {
    FilePushTO filePushTO = (FilePushTO) context.element();
    String timestamp = filePushTO.getLogTime();
    DateTimeFormatter formatter = DateTimeFormat.forPattern(CommonConstants.DATE_FORMAT_YYYYMMDD_HHMMSS_SSS)
            .withZoneUTC();
    Instant start_point = Instant.parse(timestamp, formatter);
    Calendar cal = DateUtil.getCurrentDateInUTC();
    SimpleDateFormat DATE_FORMATER_PARTITION_NAME = DateUtil.getDateFormater();
    Instant end_point = Instant.parse(DATE_FORMATER_PARTITION_NAME.format(cal.getTime()), formatter);
    return new IntervalWindow(start_point, end_point);
};

@Override
public Coder<IntervalWindow> windowCoder() {
    return IntervalWindow.getCoder();
}

@Override
public Collection<IntervalWindow> assignWindows(AssignContext c) throws Exception {
    return Arrays.asList(assignWindow(c));
}

@Override
public boolean isCompatible(WindowFn<?, ?> other) {
    return false;
}

@Override
public WindowMappingFn<IntervalWindow> getDefaultWindowMappingFn() {
    throw new IllegalArgumentException(
            "Attempted to get side input window for GlobalWindow from non-global WindowFn");
}
}

我使用间隔窗口创建了customWindow,以便在FileNamePolicy中可以获得适当的时间戳。我不能使用fixedWindow,因为它总是给我当前的时间戳


在这里,一切都很完美,但文件无法追加。它们被覆盖。

您可以使用Beam 2.1中提供的
TextIO.write().to(…).withWindowedWrites()
来执行此操作。请参阅。

您要问的问题是什么?您是否询问如何订阅发布/订阅中的数据?如何将数据写入谷歌云存储?如何格式化数据?我的问题是如何在google bucket自定义位置中写入数据,该位置基于运行时的决定,如日期和其他结构。我使用相同的方法将filepcoll.apply(Window.into(new FileTextIOWindowFn()).apply(“FileTO to to to LOG TextIO”,ParDo.of(new TextIOWriteDoFn()).apply(TextIO.write()).to(pipelineOptions.getFileStorageBucket()).withWindowedWrites().withFilenamePolicy(新文件存储文件名策略(logTypeEnum)).WithNumHards(10));但这是为每个pubsub字符串编写一个文件。如何将日志附加到日志文件中?因此,文件生成为2017-09-21T09:00:08.306Z:09_2.txt 2017-09-21T09:00:08.311Z:09_3.txt 2017-09-21T09:00:08.312Z:09_5.txtStrings没有附加…这是我的问题。您能编辑您的问题以包括您在内的代码吗r自定义窗口和文件名策略函数?
public class FileStorageFileNamePolicy extends FileBasedSink.FilenamePolicy {
/**
 * 
 */
private static final long serialVersionUID = 1L;

private static Logger LOGGER = LoggerFactory.getLogger(FileStorageFileNamePolicy.class);

private LogTypeEnum logTypeEnum;

public FileStorageFileNamePolicy(LogTypeEnum logTypeEnum) {
    this.logTypeEnum = logTypeEnum;
}

@Override
public ResourceId windowedFilename(ResourceId outputDirectory, WindowedContext context, String extension) {
    IntervalWindow window = (IntervalWindow) context.getWindow();
    String startDate = window.start().toString();
    String dateString = startDate.replace("T", CommonConstants.SPACE)
            .replaceAll(startDate.substring(startDate.indexOf("Z")), CommonConstants.EMPTY_STRING);
    String startDateHour = startDate;
    try {
        startDate = DateUtil.getDateForFileStore(dateString, null);
        startDateHour = DateUtil.getDTLocalTZHour(dateString, null);
    } catch (ParseException e) {
        LOGGER.error("Error converting date  : {}", e);
    }
    String filename = new StringBuilder(window.start().toString()).append(CommonConstants.COLON)
            .append(startDateHour).append(CommonConstants.UNDER_SCORE).append(context.getShardNumber())
            .append(".txt").toString();
    String dirName = new StringBuilder(startDate).append(CommonConstants.FORWARD_SLASH)
            .append(logTypeEnum.getValue().toLowerCase()).append(CommonConstants.FORWARD_SLASH).toString();
    LOGGER.info("Directory : {} and File Name : {}", dirName, filename);
    return outputDirectory.resolve(dirName, ResolveOptions.StandardResolveOptions.RESOLVE_DIRECTORY)
            .resolve(filename, ResolveOptions.StandardResolveOptions.RESOLVE_FILE);
}

@Override
public ResourceId unwindowedFilename(ResourceId outputDirectory, Context context, String extension) {
    throw new UnsupportedOperationException("Unsupported.");
}