Akka 如何将已排序流的项与子流分组？_Akka_Akka Stream

Akka 如何将已排序流的项与子流分组？

akka

Akka 如何将已排序流的项与子流分组？,akka,akka-stream,Akka,Akka Stream,你们能解释一下如何在akka streams中使用新的groupBy？似乎很没用groupBy用于返回（T，Source），但不再返回。下面是我的例子（我模仿了docs中的一个）：这只是挂起。可能它挂起是因为子流的数量低于唯一键的数量。但如果我有无限的流，我该怎么办呢？我想分组，直到关键更改在我的真实流中，数据总是按我分组的值排序。也许我根本不需要groupBy？如果您的流数据总是被排序的，您可以通过以下方式利用它进行分组： val source = Source(List( 1 -&g

你们能解释一下如何在akka streams中使用新的

groupBy

？似乎很没用

groupBy

用于返回

（T，Source）

，但不再返回。下面是我的例子（我模仿了docs中的一个）：

这只是挂起。可能它挂起是因为子流的数量低于唯一键的数量。但如果我有无限的流，我该怎么办呢？我想分组，直到关键更改

在我的真实流中，数据总是按我分组的值排序。也许我根本不需要

groupBy

？

如果您的流数据总是被排序的，您可以通过以下方式利用它进行分组：

val source = Source(List(
  1 -> "1a", 1 -> "1b", 1 -> "1c",
  2 -> "2a", 2 -> "2b",
  3 -> "3a", 3 -> "3b", 3 -> "3c",
  4 -> "4a",
  5 -> "5a", 5 -> "5b", 5 -> "5c",
  6 -> "6a", 6 -> "6b",
  7 -> "7a",
  8 -> "8a", 8 -> "8b",
  9 -> "9a", 9 -> "9b",
))

source
  // group elements by pairs
  // the last one will be not a pair, but a single element
  .sliding(2,1)
  // when both keys in a pair are different, we split the group into a subflow
  .splitAfter(pair => (pair.headOption, pair.lastOption) match {
    case (Some((key1, _)), Some((key2, _))) => key1 != key2
  })
  // then we cut only the first element of the pair 
  // to reconstruct the original stream, but grouped by sorted key
  .mapConcat(_.headOption.toList)
  // then we fold the substream into a single element
  .fold(0 -> List.empty[String]) {
    case ((_, values), (key, value)) => key -> (value +: values)
  }
  // merge it back and dump the results
  .mergeSubstreams
  .runWith(Sink.foreach(println))

最后，您将获得以下结果：

(1,List(1c, 1b, 1a))
(2,List(2b, 2a))
(3,List(3c, 3b, 3a))
(4,List(4a))
(5,List(5c, 5b, 5a))
(6,List(6b, 6a))
(7,List(7a))
(8,List(8b, 8a))
(9,List(9a))

但是与groupBy相比，您不受不同键数量的限制。

您也可以使用

statefulMapConcat

实现它，这将稍微便宜一些，因为它不做任何子实体化（但您必须忍受使用

var

s的耻辱）：

我最终实现了定制阶段

class GroupAfterKeyChangeStage[K, T](keyForItem: T ⇒ K, maxBufferSize: Int) extends GraphStage[FlowShape[T, List[T]]] {

  private val in = Inlet[T]("GroupAfterKeyChangeStage.in")
  private val out = Outlet[List[T]]("GroupAfterKeyChangeStage.out")

  override val shape: FlowShape[T, List[T]] =
    FlowShape(in, out)

  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with InHandler with OutHandler {

    private val buffer = new ListBuffer[T]
    private var currentKey: Option[K] = None

    // InHandler
    override def onPush(): Unit = {
      val nextItem = grab(in)
      val nextItemKey = keyForItem(nextItem)

      if (currentKey.forall(_ == nextItemKey)) {
        if (currentKey.isEmpty)
          currentKey = Some(nextItemKey)

        if (buffer.size == maxBufferSize)
          failStage(new RuntimeException(s"Maximum buffer size is exceeded on key $nextItemKey"))
        else {
          buffer += nextItem
          pull(in)
        }
      } else {
        val result = buffer.result()
        buffer.clear()
        buffer += nextItem
        currentKey = Some(nextItemKey)
        push(out, result)
      }
    }

    // OutHandler
    override def onPull(): Unit = {
      if (isClosed(in))
        failStage(new RuntimeException("Upstream finished but there was a truncated final frame in the buffer"))
      else
        pull(in)
    }

    // InHandler
    override def onUpstreamFinish(): Unit = {
      val result = buffer.result()
      if (result.nonEmpty) {
        emit(out, result)
        completeStage()
      } else
        completeStage()

      // else swallow the termination and wait for pull
    }

    override def postStop(): Unit = {
      buffer.clear()
    }

    setHandlers(in, out, this)
  }
}

如果你不想复制粘贴它，我已经将它添加到我维护的文件中。为了使用，您需要添加

Resolver.bintrayRepo("cppexpert", "maven")

给你的解决者。将傻瓜添加到依赖项中

"com.walkmind" %% "scala-tricks" % "2.15"

它在

com.walkmind.akkastream.FlowExt

中作为流实现

def groupSortedByKey[K, T](keyForItem: T ⇒ K, maxBufferSize: Int): Flow[T, List[T], NotUsed]

我的例子是

source
  .via(FlowExt.groupSortedByKey(_._1, 128))

一年后，有一门课是这样做的：

libraryDependencies += "com.typesafe.akka" %% "akka-stream-contrib" % "0.9"

以及：

好主意！昨天我还使用了

splitWhen

实现了它，但是我必须使用包含最后一个ID的

var

@shutty最后一项丢失。最后一组项不幸丢失。通过切换到新行为调用Emit时，Emit已经处理了未拉出的情况，因此无需为此阶段失败。太棒了。正是我需要的，在一行。谢谢

source
  .via(FlowExt.groupSortedByKey(_._1, 128))

libraryDependencies += "com.typesafe.akka" %% "akka-stream-contrib" % "0.9"

import akka.stream.contrib.AccumulateWhileUnchanged
source.via(new AccumulateWhileUnchanged(_._1))