Java 为什么我的处理时间窗口会触发,而事件时间窗口不会触发
我正在努力让基于事件时间的触发器为我的ApacheBeam管道触发,但似乎能够随着处理时间触发窗口触发 我的管道相当基本:Java 为什么我的处理时间窗口会触发,而事件时间窗口不会触发,java,google-cloud-dataflow,apache-beam,Java,Google Cloud Dataflow,Apache Beam,我正在努力让基于事件时间的触发器为我的ApacheBeam管道触发,但似乎能够随着处理时间触发窗口触发 我的管道相当基本: 我从pubsub reading in接收到一批数据点,其中包括毫秒级的时间戳,时间戳略早于最早的批数据点。对数据进行批处理,以减少客户端的工作量和费用 我提取二级时间戳并将时间戳应用于各个数据点 我对数据进行窗口处理,避免使用全局窗口 我按秒对数据进行分组,以便以后按秒对流数据进行分类 我最终在分类秒上使用滑动窗口,有条件地每秒向pubsub发送两条消息中的一条 我的问题
//取消对Pubsub消息的批处理
静态公共类数据点扩展DoFn{
@过程元素
public void processElement(@Element String c,OutputReceiver out){
JsonArray packedData=new-JsonParser().parse(c.getAsJsonArray();
DateTimeFormatter dtf=DateTimeFormat.forPattern(“EEE dd MMM YYYY HH:mm:ss:SSS zzz”);
用于(JsonElement acDataPoint:packedData){
字符串hereData=acDataPoint.toString();
DateTime date=dtf.parseDateTime(acDataPoint.getAsJsonObject().get(“Timestamp”).getAsString());
Instant eventTimeStamp=date.toInstant();
outputWithTimestamp(hereData,eventTimeStamp);
}
}
}
//提取第二个
静态公共类ExtractTimeStamp扩展了DoFn{
@过程元素
public void processElement(ProcessContext ctx,@Element字符串c,OutputReceiver out){
JsonObject accDataObject=new JsonParser().parse(c.getAsJsonObject();
String milliString=accDataObject.get(“Timestamp”).getAsString();
String secondString=StringUtils.left(毫字符串,24);
addProperty(“noMiliTimeStamp”,第二个字符串);
字符串updatedAccData=accDataObject.toString();
KV输出KV=KV.of(第二串,更新的数据);
输出(输出千伏);
}
}
//管道和窗口
Pipeline=Pipeline.create(选项);
PCollection数据点=管道
.apply(“从Pubsub读取”,PubsubIO.readStrings()
.fromTopic(“项目/???/topics/???”)
.WithTimestamp属性(“messageTimestamp”))
.apply(“提取单个数据点”,第页,共页(新数据点());
///这是由于某种原因不会触发的事件时间窗口
/*
PCollection windowedDataPoints=dataPoints.apply(
每(Duration.standardSeconds(1)),将窗口滑入(持续时间.standardSeconds(5))的(滑动窗口)
//.triggering(AfterWatermark.pastEndOfWindow())
.WithEarlyFirening(在处理Time.pastFirstElementInPane()之后)
.plusDelayOf(两分钟))
//.triggering(在ProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(2))之后)
.丢弃Firedpanes()
.withTimestampCombiner(TimestampCombiner.earlime)
.允许延迟(持续时间.标准秒(1));
*/
/////临时解决方案,这确实会触发,但数据出现故障
PCollection windowedDataPoints=dataPoints.apply(
Window.into(FixedWindows.of(Duration.standardMinutes(120)))
.触发(
AfterProcessingTime.pastFirstElementInPane()之后
.plusDelayOf(持续时间.standardSeconds(5)))
.丢弃Firedpanes()
.withTimestampCombiner(TimestampCombiner.earlime)
.允许延迟(持续时间.标准秒(1));
PCollection timestamp=窗口数据点
.apply(“拉出第二个用于聚合”,第页,共页(新的ExtractTimeStamp());
PCollection TimeStampedGrouped=timestamp.apply(“按键分组”,GroupByKey.create());
PCollection testing=TimeStampedGrouped.apply(“testingIsh”,ParDo.of(new LogKVIterable());
当我使用注释掉的第一个窗口策略时,我的管道会无限期运行,接收数据&LogKVIterable ParDo不会返回任何内容,当我使用处理时间时,LogKVIterable会向控制台发送和记录日志。这看起来确实像是您添加到数据中的时间戳可能错误/损坏。我鼓励您验证以下内容:
在处理时间触发与在事件时间触发不同。在处理时间上,不存在数据延迟的情况。在事件时间中,处理延迟数据是真正的挑战。晚的
// Extracting The Second
static public class ExtractTimeStamp extends DoFn<String,KV<String,String>> {
@ProcessElement
public void processElement(ProcessContext ctx ,@Element String c, OutputReceiver<KV<String,String>> out) {
JsonObject accDataObject = new JsonParser().parse(c).getAsJsonObject();
String milliString = accDataObject.get("Timestamp").getAsString();
String secondString = StringUtils.left(milliString,24);
accDataObject.addProperty("noMiliTimeStamp", secondString);
String updatedAccData = accDataObject.toString();
KV<String,String> outputKV = KV.of(secondString,updatedAccData);
out.output(outputKV);
}
}
// The Pipeline & Windowing
Pipeline pipeline = Pipeline.create(options);
PCollection<String> dataPoints = pipeline
.apply("Read from Pubsub", PubsubIO.readStrings()
.fromTopic("projects/????/topics/???")
.withTimestampAttribute("messageTimestamp"))
.apply("Extract Individual Data Points",ParDo.of(new UnpackDataPoints()));
/// This is the event time window that doesn't fire for some reason
/*
PCollection<String> windowedDataPoints = dataPoints.apply(
Window.<String>into(SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1)))
// .triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(TWO_MINUTES))
//.triggering(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(2)))
.discardingFiredPanes()
.withTimestampCombiner(TimestampCombiner.EARLIEST)
.withAllowedLateness(Duration.standardSeconds(1)));
*/
///// Temporary Work Around, this does fire but data is out of order
PCollection<String> windowedDataPoints = dataPoints.apply(
Window.<String>into(FixedWindows.of(Duration.standardMinutes(120)))
.triggering(
AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(5)))
.discardingFiredPanes()
.withTimestampCombiner(TimestampCombiner.EARLIEST)
.withAllowedLateness(Duration.standardSeconds(1)));
PCollection<KV<String, String>> TimeStamped = windowedDataPoints
.apply( "Pulling Out The Second For Aggregates", ParDo.of(new ExtractTimeStamp()));
PCollection<KV<String, Iterable<String>>> TimeStampedGrouped = TimeStamped.apply("Group By Key",GroupByKey.create());
PCollection<KV<String, Iterable<String>>> testing = TimeStampedGrouped.apply("testingIsh", ParDo.of(new LogKVIterable()));