Scala Akka stream group仅适用于任何一个

Scala Akka stream group仅适用于任何一个,scala,grouping,akka-stream,either,scala-2.11,Scala,Grouping,Akka Stream,Either,Scala 2.11,我有一个源代码,它发出或[String,MyClass] 我想用批处理的MyClass调用一个外部服务,然后用或[String,ExternalServiceResponse]继续下游,这就是为什么我需要对流的元素进行分组 如果流只发出MyClass元素,那么就很容易了-只需调用grouped: val source: Source[MyClass, NotUsed] = <custom implementation> source .grouped(10)

我有一个源代码,它发出
或[String,MyClass]

我想用批处理的
MyClass
调用一个外部服务,然后用
或[String,ExternalServiceResponse]
继续下游,这就是为什么我需要对流的元素进行分组

如果流只发出
MyClass
元素,那么就很容易了-只需调用
grouped

val source: Source[MyClass, NotUsed] = <custom implementation>
source
  .grouped(10)                 // Seq[MyClass]
  .map(callExternalService(_)) // ExternalServiceResponse
val-source:source[MyClass,未使用]=
来源
.grouped(10)//Seq[MyClass]
.map(callExternalService())//ExternalServiceResponse
但是,在我的场景中,如何仅将元素分组到任意一个元素的右侧

val source: Source[Either[String, MyClass], NotUsed] = <custom implementation>
source
  .???                                                      // Either[String, Seq[MyClass]]
  .map {
    case Right(myClasses) => callExternalService(myClasses)
    case Left(string) => Left(string)
  }                                                         // Either[String, ExternalServiceResponse]
val-source:source[字符串,MyClass],未使用]
来源
.???                                                      // [String,Seq[MyClass]]
.地图{
大小写权限(myClasses)=>callExternalService(myClasses)
大小写左(字符串)=>左(字符串)
}//要么[String,ExternalServiceResponse]

下面的方法很有效,但还有更惯用的方法吗

val source: Source[Either[String, MyClass], NotUsed] = <custom implementation>
source
  .groupBy(2, either => either.isRight)
  .grouped(10)
  .map(input => input.headOption match {
    case Some(Right(_)) =>
      callExternalService(input.map(item => item.right.get))
    case _ =>
      input
  })
  .mapConcat(_.to[scala.collection.immutable.Iterable])
  .mergeSubstreams
val-source:source[字符串,MyClass],未使用]
来源
.groupBy(2,任择=>任择.isRight)
.分组(10)
.map(输入=>input.headOption匹配{
案例部分(右())=>
callExternalService(input.map(item=>item.right.get))
案例=>
输入
})
.mapConcat(u.to[scala.collection.immutable.Iterable])
.合并子流

这应该将
或[L,R]
的源转换为
或[L,Seq[R]]
的源,并具有可配置的
分组

def groupRights[L, R](groupSize: Int)(in: Source[Either[L, R], NotUsed]): Source[Either[L, Seq[R]], NotUsed] =
  in.map(Option _)  // Yep, an Option[Either[L, R]]
    .concat(Source.single(None)) // to emit when `in` completes
    .statefulMapConcat { () =>
      val buffer = new scala.collection.mutable.ArrayBuffer[R](groupSize)

      def dumpBuffer(): List[Either[L, Seq[R]] = {
        val out = List(Right(buffer.toList))
        buffer.clear()
        out
      }

      incoming: Option[Either[L,R]] => {
        incoming.map { _.fold(
            l => List(Left(l)),  // unfortunate that we have to re-wrap
            r => {
              buffer += r
              if (buffer.size == groupSize) {
                dumpBuffer()
              } else {
                Nil
              }
            }
          )
        }.getOrElse(dumpBuffer()) // End of stream
      }
    }
除此之外,我还要注意调用外部服务的下游代码可以重写为

.map(_.right.map(callExternalService))
如果您可以使用parallelism
n
可靠地调用外部服务,那么使用以下方法也值得:

.mapAsync(n) { e.fold(
    l => Future.successful(Left(l)),
    r => Future { Right(callExternalService(r)) }
  )
}

如果您想以保持顺序为代价最大限度地提高吞吐量,甚至可以将
mapsync
替换为
mapsynordered
这应该将
or[L,R]
的源转换为
or[L,Seq[R]
的源,并使用
Right
s的可配置分组

def groupRights[L, R](groupSize: Int)(in: Source[Either[L, R], NotUsed]): Source[Either[L, Seq[R]], NotUsed] =
  in.map(Option _)  // Yep, an Option[Either[L, R]]
    .concat(Source.single(None)) // to emit when `in` completes
    .statefulMapConcat { () =>
      val buffer = new scala.collection.mutable.ArrayBuffer[R](groupSize)

      def dumpBuffer(): List[Either[L, Seq[R]] = {
        val out = List(Right(buffer.toList))
        buffer.clear()
        out
      }

      incoming: Option[Either[L,R]] => {
        incoming.map { _.fold(
            l => List(Left(l)),  // unfortunate that we have to re-wrap
            r => {
              buffer += r
              if (buffer.size == groupSize) {
                dumpBuffer()
              } else {
                Nil
              }
            }
          )
        }.getOrElse(dumpBuffer()) // End of stream
      }
    }
除此之外,我还要注意调用外部服务的下游代码可以重写为

.map(_.right.map(callExternalService))
如果您可以使用parallelism
n
可靠地调用外部服务,那么使用以下方法也值得:

.mapAsync(n) { e.fold(
    l => Future.successful(Left(l)),
    r => Future { Right(callExternalService(r)) }
  )
}

如果您想以保持顺序为代价最大限度地提高吞吐量,甚至可以将
mapsync
替换为
mapsyncUnordered

,您可以将源代码分成两个分支,以各自的方式处理权限,然后合并回两个子流:

// case class MyClass(x: Int)
// case class ExternalServiceResponse(xs: Seq[MyClass])
// def callExternalService(xs: Seq[MyClass]): ExternalServiceResponse =
//    ExternalServiceResponse(xs)
// val source: Source[Either[String, MyClass], _] =
//   Source(List(Right(MyClass(1)), Left("2"), Right(MyClass(3)), Left("4"), Right(MyClass(5))))

val lefts: Source[Either[String, Nothing], _] =
  source
    .collect { case Left(l) => Left(l) }

val rights: Source[Either[Nothing, ExternalServiceResponse], _] =
  source
    .collect { case Right(x: MyClass) => x }
    .grouped(2)
    .map(callExternalService)
    .map(Right(_))

val out: Source[Either[String, ExternalServiceResponse], _] = rights.merge(lefts)

// out.runForeach(println)
// Left(2)
// Right(ExternalServiceResponse(Vector(MyClass(1), MyClass(3))))
// Left(4)
// Right(ExternalServiceResponse(Vector(MyClass(5))))

您可以将eithers源划分为两个分支,以便以自己的方式处理权限,然后合并回两个子流:

// case class MyClass(x: Int)
// case class ExternalServiceResponse(xs: Seq[MyClass])
// def callExternalService(xs: Seq[MyClass]): ExternalServiceResponse =
//    ExternalServiceResponse(xs)
// val source: Source[Either[String, MyClass], _] =
//   Source(List(Right(MyClass(1)), Left("2"), Right(MyClass(3)), Left("4"), Right(MyClass(5))))

val lefts: Source[Either[String, Nothing], _] =
  source
    .collect { case Left(l) => Left(l) }

val rights: Source[Either[Nothing, ExternalServiceResponse], _] =
  source
    .collect { case Right(x: MyClass) => x }
    .grouped(2)
    .map(callExternalService)
    .map(Right(_))

val out: Source[Either[String, ExternalServiceResponse], _] = rights.merge(lefts)

// out.runForeach(println)
// Left(2)
// Right(ExternalServiceResponse(Vector(MyClass(1), MyClass(3))))
// Left(4)
// Right(ExternalServiceResponse(Vector(MyClass(5))))