Apache flink Apache flink完全外部联接中的错误结果
我有两个数据流,它们是从以下两个表创建的:Apache flink Apache flink完全外部联接中的错误结果,apache-flink,flink-table-api,Apache Flink,Flink Table Api,我有两个数据流,它们是从以下两个表创建的: Table orderRes1 = ste.sqlQuery( "SELECT orderId, userId, SUM(bidPrice) as q FROM " + tble + " Group by orderId, userId"); Table orderRes2 = ste.sqlQuery(
Table orderRes1 = ste.sqlQuery(
"SELECT orderId, userId, SUM(bidPrice) as q FROM " + tble +
" Group by orderId, userId");
Table orderRes2 = ste.sqlQuery(
"SELECT orderId, userId, SUM(askPrice) as q FROM " + tble +
" Group by orderId, userId");
DataStream<Tuple2<Boolean, Row>> ds1 = ste.toRetractStream(orderRes1 , Row.class).
filter(order-> order.f0);
DataStream<Tuple2<Boolean, Row>> ds2 = ste.toRetractStream(orderRes2 , Row.class).
filter(order-> order.f0);
但是,在某些情况下,结果是不正确的:假设用户A以A的价格向B销售3次,在用户B向A销售2次之后,第二次结果是:
7> (正确,123,a,300.0,0.0)
7> (正确,123,a,300.0200.0)
10> (对,123,b,0.0300.0)
10> (真实,123,b,200.0300.0)
第二行和第四行是stream的预期结果,但它也将生成第一行和第三行。
值得一提的是,coGroup是另一种解决方案,但我不想在这种情况下使用窗口,而非窗口解决方案只能在有界流(DataSet)中访问
提示:orderId和userId将在两个流中重复,我希望在每个操作中生成2行,其中包含:
orderId、userId1、bidTotalPrice、askTotalPrice和
orderId、userId2、bidTotalPrice、askTotalPrice等类似的内容在流式查询(或者在动态表上执行查询)中是可以预期的。与传统数据库不同,传统数据库在查询执行期间与查询的输入关系保持静态,流式查询的输入不断更新,因此结果也必须不断更新 如果我理解这里的设置,在处理
orderRes2
中的相关行之前,第1行和第3行上的“不正确”结果是正确的。如果这些行从未到达,那么第1行和第3行将保持正确
您应该期望的是最终正确的结果,包括必要的收回。通过启用,可以减少中间结果的数量
这提供了更多的洞察力。如果我误解了你的情况,请提供一个可复制的例子来说明问题
Table bidOrdr = ste.fromDataStream(bidTuple, $("orderId"),
$("userId"), $("price"));
Table askOrdr = ste.fromDataStream(askTuple, $("orderId"),
$("userId"), $("price"));
Table result = ste.sqlQuery(
"SELECT COALESCE(bidTbl.orderId,askTbl.orderId) , " +
" COALESCE(bidTbl.userId,askTbl.orderId)," +
" COALESCE(bidTbl.bidTotalPrice,0) as bidTotalPrice, " +
" COALESCE(askTbl.askTotalPrice,0) as askTotalPrice, " +
" FROM " +
" (SELECT orderId, userId," +
" SUM(price) AS bidTotalPrice " +
" FROM " + bidOrdr +
" Group by orderId, userId) bidTbl full outer JOIN " +
" (SELECT orderId, userId," +
" SUM(price) AS askTotalPrice" +
" FROM " + askOrdr +
" Group by orderId, userId) askTbl " +
" ON (bidTbl.orderId = askTbl.orderId" +
" AND bidTbl.userId= askTbl.userId) ") ;
DataStream<Tuple2<Boolean, Row>> = ste.toRetractStream(result, Row.class).filter(order -> order.f0);