Apache flink 了解基于事件时间的间隔联接结果

Apache flink 了解基于事件时间的间隔联接结果,apache-flink,Apache Flink,我正在使用Flink 1.12并探索基于事件时间的两个流的连接间隔 在我的应用程序中,有两个源,这两个流都允许4秒延迟 第一个(id、交易日期、价格): 第二(身份证、姓名、交易日期): 当我执行以下查询时: select s1.id, s2.name, s1.price, cast (s1.rt1 as timestamp) as rt1, s2.rt2 from s1 join s2 on s1.id = s2.id where s1.rt1

我正在使用Flink 1.12并探索基于事件时间的两个流的连接间隔

在我的应用程序中,有两个源,这两个流都
允许4秒延迟

第一个(id、交易日期、价格):

第二(身份证、姓名、交易日期):

当我执行以下查询时:

      select s1.id, s2.name, s1.price, cast (s1.rt1 as timestamp) as rt1, s2.rt2
      from s1 join s2
      on s1.id = s2.id
      where s1.rt1 between s2.rt2 - interval '2' second and s2.rt2 + interval '2' second
结果如下:

    id1,Stock1,1.0,2020-09-16T12:50:15,2020-09-16T12:50:16
    id4,Stock4,4.0,2020-09-16T12:50:18,2020-09-16T12:50:17
    id6,Stock7,8.0,2020-09-16T12:50:22,2020-09-16T12:50:21
    id6,Stock6,8.0,2020-09-16T12:50:22,2020-09-16T12:50:23
    id1,Stock1,100.0,2020-09-16T12:50:15,2020-09-16T12:50:16
结果的最后一行从第一个流的最后一行和第二个流的第一行合并而来:

Stock("id1", "2020-09-16 20:50:15".ts, 100)
StockNameChanging("id1", "Stock1", "2020-09-16 20:50:16".ts)
这两个事件都应该在它们的流程中较晚发生。为什么他们能结合在一起?应对后期事件的策略是什么?以及弗林克在州内保存数据的时间

我一直被这个问题困扰着,谁能帮我解决这个问题,谢谢

完整的申请代码如下,供您参考

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.bridge.scala._
import org.apache.flink.table.api.{AnyWithOperations, FieldExpression}
import org.apache.flink.types.Row
import org.example.sources.{IntervalJoinStockNameChangingSource, IntervalJoinStockSource, StockNameChangingWatermarkGenerator, StockWatermarkGenerator}
import org.scalatest.funsuite.AnyFunSuite


class T015_IntervalJoinEventTime extends AnyFunSuite {




  test("test interval join inner 2 works") {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    val ds1 = env.addSource(new IntervalJoinStockSource(emitInterval = 0)).assignTimestampsAndWatermarks(new StockWatermarkGenerator(laziness = 4000))
    val ds2 = env.addSource(new IntervalJoinStockNameChangingSource(emitInterval = 0)).assignTimestampsAndWatermarks(new StockNameChangingWatermarkGenerator(laziness = 4000))
    val tenv = StreamTableEnvironment.create(env)
    tenv.createTemporaryView("s1", ds1, $"id", $"price", $"trade_date".rowtime() as "rt1")
    tenv.createTemporaryView("s2", ds2, $"id", $"name", $"trade_date".rowtime() as "rt2")
    tenv.from("s1").printSchema()
    tenv.from("s2").printSchema()
    val sql =
      """
      select s1.id, s2.name, s1.price, cast (s1.rt1 as timestamp) as rt1, s2.rt2
      from s1 join s2
      on s1.id = s2.id
      where s1.rt1 between s2.rt2 - interval '2' second and s2.rt2 + interval '2' second

      """.stripMargin(' ')

    tenv.sqlQuery(sql).toAppendStream[Row].print()
    env.execute()


  }

}
不要在标题中加“[需要帮助]”。我们知道你需要帮助。这就是你问问题的原因。
Stock("id1", "2020-09-16 20:50:15".ts, 100)
StockNameChanging("id1", "Stock1", "2020-09-16 20:50:16".ts)
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.bridge.scala._
import org.apache.flink.table.api.{AnyWithOperations, FieldExpression}
import org.apache.flink.types.Row
import org.example.sources.{IntervalJoinStockNameChangingSource, IntervalJoinStockSource, StockNameChangingWatermarkGenerator, StockWatermarkGenerator}
import org.scalatest.funsuite.AnyFunSuite


class T015_IntervalJoinEventTime extends AnyFunSuite {




  test("test interval join inner 2 works") {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    val ds1 = env.addSource(new IntervalJoinStockSource(emitInterval = 0)).assignTimestampsAndWatermarks(new StockWatermarkGenerator(laziness = 4000))
    val ds2 = env.addSource(new IntervalJoinStockNameChangingSource(emitInterval = 0)).assignTimestampsAndWatermarks(new StockNameChangingWatermarkGenerator(laziness = 4000))
    val tenv = StreamTableEnvironment.create(env)
    tenv.createTemporaryView("s1", ds1, $"id", $"price", $"trade_date".rowtime() as "rt1")
    tenv.createTemporaryView("s2", ds2, $"id", $"name", $"trade_date".rowtime() as "rt2")
    tenv.from("s1").printSchema()
    tenv.from("s2").printSchema()
    val sql =
      """
      select s1.id, s2.name, s1.price, cast (s1.rt1 as timestamp) as rt1, s2.rt2
      from s1 join s2
      on s1.id = s2.id
      where s1.rt1 between s2.rt2 - interval '2' second and s2.rt2 + interval '2' second

      """.stripMargin(' ')

    tenv.sqlQuery(sql).toAppendStream[Row].print()
    env.execute()


  }

}