Join 在akka源上实现sql合并联接_Join_Akka_Akka Stream

Join 在akka源上实现sql合并联接

join akka

Join 在akka源上实现sql合并联接,join,akka,akka-stream,Join,Akka,Akka Stream,我想为不同的akka流实现sql合并联接。例如，我有3门课： case class A(id: String, as: String) case class B(a_id: String, bs: String) case class C(id: String, as: String, bs: String) 我有两个源（Source[A]，Source[B]都是按id和A\u id排序的），我想得到一个Sink[C]由id=A\u id预测的。我不明白它怎么可能实现流的示例：来源[A]

我想为不同的akka流实现

sql合并联接

。例如，我有3门课：

case class A(id: String, as: String)
case class B(a_id: String, bs: String)
case class C(id: String, as: String, bs: String)

我有两个源（

Source[A]

，

Source[B]

都是按

id

和

A\u id

排序的），我想得到一个

Sink[C]

由

id=A\u id

预测的

。我不明白它怎么可能实现
流的示例：
来源[A]
包含：A（1，“a1”）、A（2，“a2”）、A（3，“a3_1”）、A（3，“a3_2”）、A（4，“a4”）
Source[B]
包含：B（2，“b2”），B（3，“b3”）
Sink[C]
必须是：C（2，“a2”，“b2”）、C（3，“a3_1”，“b3”）、C（3，“a3_2”，“b3”）
上述示例对于OneToManyMergeJoin规范不正确。
正确答案是：
case class A(id: Int, as: String)
case class B(a_id: Int, bs: String)
case class C(id: Int, as: String, bs: String)

val source1: Source[A, NotUsed] = Source(
  List(A(1, "a1"), A(2, "a2"), A(3, "a3"), A(4, "a4"))
)
val source2: Source[B, NotUsed] = Source(
  List(B(2, "b2"), B(3, "b3_1"), B(3, "b3_2"))
)

其中一个源必须是不同的，并且两者都按某些属性排序。
所以我们可以使用
class OneToManyMergeJoin[Distinct, Duplicated, O](val zipper: (Distinct, Duplicated) ⇒ O, val comparator: (Distinct, Duplicated) => Int) extends GraphStage[FanInShape2[Distinct, Duplicated , O]]  {
  override val shape: FanInShape2[Distinct, Duplicated , O] = new FanInShape2("OneToManyMergeJoin")

  private val left = shape.in0
  private val right = shape.in1
  private val out = shape.out

  override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) with StageLogging {
    setHandler(left, ignoreTerminateInput)
    setHandler(right, ignoreTerminateInput)
    setHandler(out, eagerTerminateOutput)

    var leftValue: Distinct = _
    var rightValue: Duplicated = _

    def dispatch(l: Distinct, r: Duplicated): Unit = {

      val c = comparator(leftValue,rightValue)

      if (c == 0) {
        emit(out, zipper(leftValue,rightValue), readR)
      } else {
        if (c < 0) readL() else readR()
      }

    }

    private val dispatchR = { v: Duplicated =>
      rightValue = v
      dispatch(leftValue, rightValue)
    }

    private val dispatchL = { v: Distinct =>
      leftValue = v
      dispatch(leftValue, rightValue)
    }

    lazy val readR: () => Unit = () => read(right)(dispatchR, () => if (comparator(leftValue, rightValue) < 0) readL() else completeStage())
    lazy val readL: () => Unit = () => read(left)(dispatchL, () => if (comparator(leftValue, rightValue) > 0) readR() else completeStage())

    override def preStart(): Unit = {
      // all fan-in stages need to eagerly pull all inputs to get cycles started
      pull(right)
      read(left)(
        l => {
          leftValue = l
          read(right)(dispatchR, () => completeStage())
        },
        () => completeStage()
      )
    }

  }

}

结果:
DEBUG Sink: C(2,a2,b2)
DEBUG Sink: C(3,a3,b3_1)
DEBUG Sink: C(3,a3,b3_2)

为什么我需要它？其中一种情况是合并cassandra不完整的非规范化表（但已排序）。
上述示例不适用于OneToManyMergeJoinspec。
正确答案是：
case class A(id: Int, as: String)
case class B(a_id: Int, bs: String)
case class C(id: Int, as: String, bs: String)

val source1: Source[A, NotUsed] = Source(
  List(A(1, "a1"), A(2, "a2"), A(3, "a3"), A(4, "a4"))
)
val source2: Source[B, NotUsed] = Source(
  List(B(2, "b2"), B(3, "b3_1"), B(3, "b3_2"))
)

其中一个源必须是不同的，并且两者都按某些属性排序。
所以我们可以使用
class OneToManyMergeJoin[Distinct, Duplicated, O](val zipper: (Distinct, Duplicated) ⇒ O, val comparator: (Distinct, Duplicated) => Int) extends GraphStage[FanInShape2[Distinct, Duplicated , O]]  {
  override val shape: FanInShape2[Distinct, Duplicated , O] = new FanInShape2("OneToManyMergeJoin")

  private val left = shape.in0
  private val right = shape.in1
  private val out = shape.out

  override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) with StageLogging {
    setHandler(left, ignoreTerminateInput)
    setHandler(right, ignoreTerminateInput)
    setHandler(out, eagerTerminateOutput)

    var leftValue: Distinct = _
    var rightValue: Duplicated = _

    def dispatch(l: Distinct, r: Duplicated): Unit = {

      val c = comparator(leftValue,rightValue)

      if (c == 0) {
        emit(out, zipper(leftValue,rightValue), readR)
      } else {
        if (c < 0) readL() else readR()
      }

    }

    private val dispatchR = { v: Duplicated =>
      rightValue = v
      dispatch(leftValue, rightValue)
    }

    private val dispatchL = { v: Distinct =>
      leftValue = v
      dispatch(leftValue, rightValue)
    }

    lazy val readR: () => Unit = () => read(right)(dispatchR, () => if (comparator(leftValue, rightValue) < 0) readL() else completeStage())
    lazy val readL: () => Unit = () => read(left)(dispatchL, () => if (comparator(leftValue, rightValue) > 0) readR() else completeStage())

    override def preStart(): Unit = {
      // all fan-in stages need to eagerly pull all inputs to get cycles started
      pull(right)
      read(left)(
        l => {
          leftValue = l
          read(right)(dispatchR, () => completeStage())
        },
        () => completeStage()
      )
    }

  }

}

结果:
DEBUG Sink: C(2,a2,b2)
DEBUG Sink: C(3,a3,b3_1)
DEBUG Sink: C(3,a3,b3_2)

为什么我需要它？其中一种情况是合并cassandra未完全非规范化的表（但已排序）。
我认为合并联接不可能不占用内存。左连接或右连接可能有效。可能是我写得不正确，我写过SQL内部/左连接的实现，就像从查询计划中一样。我认为合并连接不可能不占用内存。左连接或右连接可能有效。可能是我写得不正确，我写了关于SQL内部/左连接的文章，就像从查询计划中实现的一样