Apache flink 如何根据事件时间理解越窗结果

Apache flink 如何根据事件时间理解越窗结果,apache-flink,Apache Flink,我正在使用Flink 1.12并探索基于事件窗口的over窗口: 数据有三列:id、交易时间、价格。为了方便起见,所有数据都有相同的id(id1),我在代码中允许4秒的惰性 Stock("id1", "2020-09-16 20:50:15".ts, 1), Stock("id1", "2020-09-16 20:50:12".ts, 2), Stock("id1", &q

我正在使用Flink 1.12并探索基于事件窗口的over窗口:

数据有三列:
id、交易时间、价格
。为了方便起见,所有数据都有相同的id(
id1
),我在代码中允许4秒的惰性

    Stock("id1", "2020-09-16 20:50:15".ts, 1),
    Stock("id1", "2020-09-16 20:50:12".ts, 2),
    Stock("id1", "2020-09-16 20:50:11".ts, 3),
    Stock("id1", "2020-09-16 20:50:18".ts, 4),
    Stock("id1", "2020-09-16 20:50:13".ts, 5),
    Stock("id1", "2020-09-16 20:50:20".ts, 6),
    Stock("id1", "2020-09-16 20:50:14".ts, 7),
    Stock("id1", "2020-09-16 20:50:22".ts, 8),
    Stock("id1", "2020-09-16 20:50:40".ts, 9)
当我运行follow查询时

      select
        id,
        price,
        sum(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as sum_price,
        max(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as max_price
      from sourceTable
输出为:

    id1,3,3,3
    id1,2,5,3
    id1,5,10,5
    id1,1,8,5
    id1,4,10,5
    id1,6,11,6
    id1,8,18,8
  • 由于第一个事件是
    Stock(“id1”,“2020-09-16 20:50:15”。ts,1)
    ,因此我认为第一个输出应该是
    id,1,1
    ,而不是
    id1,3,3

  • 2020-09-16 20:50:15
    是第一个事件,因此当它到来时,水位线将是
    2020-09-16 20:50:11
    ,这意味着下一个事件
    Stock(“id1”,“2020-09-16 20:50:11”。ts,3)
    应该延迟并下降,而
    Stock(“id1”,“2020-09-16 20:50:13”。ts,5)
    也应该延迟。为什么这两个事件不迟并包含在输出中

  • 输出不包括
    库存(“id1”,“2020-09-16 20:50:14”。ts,7),
    是晚了还是其他什么,为什么输出中包含事件(在上述问题2中)

  • 基本问题是何时触发over窗口进行计算

  • 这些问题我已经讨论了好几个小时,无法解释,您能帮我看一下吗

    以下是应用程序代码,供您参考

    import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks
    import org.apache.flink.streaming.api.scala._
    import org.apache.flink.streaming.api.watermark.Watermark
    import org.apache.flink.table.api.bridge.scala._
    import org.apache.flink.table.api.{AnyWithOperations, FieldExpression}
    import org.apache.flink.types.Row
    import org.example.model.Stock
    import org.example.sources.StockSource
    import org.example.utils.Implicits._
    
    object Sql016_EventTimeOverWindowSqlTest {
      def main(args: Array[String]): Unit = {
        val env = StreamExecutionEnvironment.getExecutionEnvironment
        env.setParallelism(1)
        val ds: DataStream[Stock] = env.addSource(new StockSource(emitInterval = 0, print = false))
        val ds2 = ds.assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks[Stock] {
          var max = Long.MinValue
    
          override def checkAndGetNextWatermark(t: Stock, l: Long): Watermark = {
            if (t.trade_date.getTime > max) {
              max = t.trade_date.getTime
            }
            new Watermark(max - 4000) //allow 4 seconds late
          }
    
          override def extractTimestamp(t: Stock, l: Long): Long = t.trade_date.getTime
        })
        val tenv = StreamTableEnvironment.create(env)
        val table = tenv.fromDataStream(ds2, $"id", $"price", $"rt".rowtime())
        tenv.createTemporaryView("sourceTable", table)
    
        val sql =
          """
          select
            id,
            price,
            sum(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as sum_price,
            max(price) OVER (PARTITION BY id ORDER BY rt rows between 2 preceding and current row) as max_price
          from sourceTable
    
    
          """.stripMargin(' ')
    
        val table2 = tenv.sqlQuery(sql)
    
        table2.toAppendStream[Row].print()
    
        env.execute()
    
      }
    }
    
    

    有人能解释一下结果吗?我已经在这个问题上纠缠了好几天了!