Apache flink 如何根据事件时间理解越窗结果
我正在使用Flink 1.12并探索基于事件窗口的over窗口: 数据有三列:Apache flink 如何根据事件时间理解越窗结果,apache-flink,Apache Flink,我正在使用Flink 1.12并探索基于事件窗口的over窗口: 数据有三列:id、交易时间、价格。为了方便起见,所有数据都有相同的id(id1),我在代码中允许4秒的惰性 Stock("id1", "2020-09-16 20:50:15".ts, 1), Stock("id1", "2020-09-16 20:50:12".ts, 2), Stock("id1", &q
id、交易时间、价格
。为了方便起见,所有数据都有相同的id(id1
),我在代码中允许4秒的惰性
Stock("id1", "2020-09-16 20:50:15".ts, 1),
Stock("id1", "2020-09-16 20:50:12".ts, 2),
Stock("id1", "2020-09-16 20:50:11".ts, 3),
Stock("id1", "2020-09-16 20:50:18".ts, 4),
Stock("id1", "2020-09-16 20:50:13".ts, 5),
Stock("id1", "2020-09-16 20:50:20".ts, 6),
Stock("id1", "2020-09-16 20:50:14".ts, 7),
Stock("id1", "2020-09-16 20:50:22".ts, 8),
Stock("id1", "2020-09-16 20:50:40".ts, 9)
当我运行follow查询时
select
id,
price,
sum(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as sum_price,
max(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as max_price
from sourceTable
输出为:
id1,3,3,3
id1,2,5,3
id1,5,10,5
id1,1,8,5
id1,4,10,5
id1,6,11,6
id1,8,18,8
Stock(“id1”,“2020-09-16 20:50:15”。ts,1)
,因此我认为第一个输出应该是id,1,1
,而不是id1,3,3
2020-09-16 20:50:15
是第一个事件,因此当它到来时,水位线将是2020-09-16 20:50:11
,这意味着下一个事件Stock(“id1”,“2020-09-16 20:50:11”。ts,3)
应该延迟并下降,而Stock(“id1”,“2020-09-16 20:50:13”。ts,5)
也应该延迟。为什么这两个事件不迟并包含在输出中
库存(“id1”,“2020-09-16 20:50:14”。ts,7),
是晚了还是其他什么,为什么输出中包含事件(在上述问题2中)
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.watermark.Watermark
import org.apache.flink.table.api.bridge.scala._
import org.apache.flink.table.api.{AnyWithOperations, FieldExpression}
import org.apache.flink.types.Row
import org.example.model.Stock
import org.example.sources.StockSource
import org.example.utils.Implicits._
object Sql016_EventTimeOverWindowSqlTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val ds: DataStream[Stock] = env.addSource(new StockSource(emitInterval = 0, print = false))
val ds2 = ds.assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks[Stock] {
var max = Long.MinValue
override def checkAndGetNextWatermark(t: Stock, l: Long): Watermark = {
if (t.trade_date.getTime > max) {
max = t.trade_date.getTime
}
new Watermark(max - 4000) //allow 4 seconds late
}
override def extractTimestamp(t: Stock, l: Long): Long = t.trade_date.getTime
})
val tenv = StreamTableEnvironment.create(env)
val table = tenv.fromDataStream(ds2, $"id", $"price", $"rt".rowtime())
tenv.createTemporaryView("sourceTable", table)
val sql =
"""
select
id,
price,
sum(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as sum_price,
max(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as max_price
from sourceTable
""".stripMargin(' ')
val table2 = tenv.sqlQuery(sql)
table2.toAppendStream[Row].print()
env.execute()
}
}
有人能解释一下结果吗?我已经在这个问题上纠缠了好几天了!