Scala集合未在简单Akka Streams操作中具体化
来自未绑定流源的数据如下所示:Scala集合未在简单Akka Streams操作中具体化,scala,akka,akka-stream,Scala,Akka,Akka Stream,来自未绑定流源的数据如下所示: value1, value3, ..., START, value155, ..., value202, END, ..., value234, value235, ... START, value298, ..., value310, END, ..., value377, ... 基于,我提出了以下代码,使用Akka Streams在固定的“开始键”和“结束键”(此处为开始键和结束键)之间累积消息: 唉,没有什么能通过过滤器!没
value1,
value3,
...,
START,
value155,
...,
value202,
END,
...,
value234,
value235,
...
START,
value298,
...,
value310,
END,
...,
value377,
...
基于,我提出了以下代码,使用Akka Streams在固定的“开始键”和“结束键”(此处为开始键和结束键)之间累积消息:
唉,没有什么能通过过滤器!没有输出
将coll.head.equals
和coll.last.equals
替换为.contains
,会返回一个结果,当然它是不正确的,因为“end”在某些点上总是包含在内
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
val list = List("d1", "d2", "d3", "start", "d4", "d5", "d6", "d7", "end", "d9", "d10", "start", "d11", "d12", "d13", "d14", "end", "d15")
val source = Source(list) // actual Source is unbound, has many more items between "start" and "end"; also the cycle of "start" and "end" repeats
implicit val system = ActorSystem("collection-accumulator")
implicit val materializer = ActorMaterializer()
Source(list)
.scan(Seq.empty[String]) { (coll, s) =>
if(s.equals("start") || coll.contains("start"))
coll :+ s
else
Seq.empty[String]
}
.filter(_.contains("end"))
.to(Sink.foreach(println)).run()
正如预期的那样,输出为:
List(start, d4, d5, d6, d7, end)
List(start, d4, d5, d6, d7, end, d9)
List(start, d4, d5, d6, d7, end, d9, d10)
对如何解决这个问题有什么建议吗?我怀疑在这个过程中需要强制一些“物化”,或者我可能只是遇到了一些我不知道的懒惰的eval/actor/async问题。提前谢谢
(在撰写本文时,有一个现成的ScaleFiddle,用于快速处理Akka流)
编辑:
澄清“未绑定”-我的意思是,消息列表不仅是未绑定的,而且“开始”和“结束”循环也会重复。我相应地更新了示例。一种方法是使用:
上述代码打印以下内容:
d4
d5
d6
d7
如果您希望在“开始”和“结束”之间累积元素,而不是在流式处理的基础上单独打印这些元素,则可以调整上述代码段以实现此目的。或者,从项目中看一看。一种方法是使用:
上述代码打印以下内容:
d4
d5
d6
d7
如果您希望在“开始”和“结束”之间累积元素,而不是在流式处理的基础上单独打印这些元素,则可以调整上述代码段以实现此目的。或者,从项目中看一看。这里有一种方法,首先将源元素转换为滑动的2元素列表,删除前“
开始
”列表,然后获取前“结束
”列表,然后使用以下命令有条件地捕获列表元素:
要捕获集合中的元素,只需将
runForeach(println)
替换为runWith(Sink.seq[String])
以下是一种方法,首先将源元素转换为滑动的2元素列表,删除前“开始
”列表,然后使用前“结束
”列表,然后使用以下命令有条件地捕获列表元素:
要捕获集合中的元素,只需将
runForeach(println)
替换为runWith(Sink.seq[String])
可以使用另一种方法,即按时间和权重函数对元素进行分组。诀窍是为除“end”之外的所有元素指定零权重,并使“end”元素的重量足以等于或大于maxWeight
weight:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.concurrent.Await
import scala.concurrent.duration.DurationInt
implicit val system = ActorSystem("collection-accumulator")
implicit val materializer = ActorMaterializer()
val source = Source(list) // actual Source is unbound, has many more items between "start" and "end"
val maxDuration = 120.seconds // put arbitrarily high duration ehre
val resultFuture = Source(list)
// accumulates everything up until and including "end" element
// essentially splits at "end" elements
.groupedWeightedWithin(1L, maxDuration)({
case "end" => 1L
case _ => 0
})
.map(accumulated =>
accumulated
.dropWhile(_ != "start") // drop everything till "start" element
.drop(1) // drop "start"
.takeWhile(_ != "end") // take everything until "end" is seen
)
// Run and accumulate into seq - result will be Seq[Seq[String]]
.runWith(Sink.seq)
println()
Await.result(resultFuture, 1.second) # Vector(Vector(d4, d5, d6, d7), Vector(d11, d12, d13))
这允许捕获多个“开始”-…-“结束”序列,而无需重新具体化流(只需一个序列即可正常工作)可以使用另一种方法,即按时间和权重函数对元素进行分组。诀窍是为除“end”之外的所有元素指定零权重,并使“end”元素的重量足以等于或大于
maxWeight
weight:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.concurrent.Await
import scala.concurrent.duration.DurationInt
implicit val system = ActorSystem("collection-accumulator")
implicit val materializer = ActorMaterializer()
val source = Source(list) // actual Source is unbound, has many more items between "start" and "end"
val maxDuration = 120.seconds // put arbitrarily high duration ehre
val resultFuture = Source(list)
// accumulates everything up until and including "end" element
// essentially splits at "end" elements
.groupedWeightedWithin(1L, maxDuration)({
case "end" => 1L
case _ => 0
})
.map(accumulated =>
accumulated
.dropWhile(_ != "start") // drop everything till "start" element
.drop(1) // drop "start"
.takeWhile(_ != "end") // take everything until "end" is seen
)
// Run and accumulate into seq - result will be Seq[Seq[String]]
.runWith(Sink.seq)
println()
Await.result(resultFuture, 1.second) # Vector(Vector(d4, d5, d6, d7), Vector(d11, d12, d13))
这允许捕获多个“开始”-…-“结束”序列,而无需重新具体化流(只需一个序列即可正常工作)谢谢@leo-c-使用
滑动的非常好的方法
!您的解决方案有效,但我没有正确解释,我的邮件列表也多次重复“开始”和“结束”!我已经相应地更新了原来的问题。谢谢@leo-c-使用滑动的方法非常好
!您的解决方案有效,但我没有正确解释,我的邮件列表也多次重复“开始”和“结束”!我已相应地更新了原来的问题。
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.concurrent.Await
import scala.concurrent.duration.DurationInt
implicit val system = ActorSystem("collection-accumulator")
implicit val materializer = ActorMaterializer()
val source = Source(list) // actual Source is unbound, has many more items between "start" and "end"
val maxDuration = 120.seconds // put arbitrarily high duration ehre
val resultFuture = Source(list)
// accumulates everything up until and including "end" element
// essentially splits at "end" elements
.groupedWeightedWithin(1L, maxDuration)({
case "end" => 1L
case _ => 0
})
.map(accumulated =>
accumulated
.dropWhile(_ != "start") // drop everything till "start" element
.drop(1) // drop "start"
.takeWhile(_ != "end") // take everything until "end" is seen
)
// Run and accumulate into seq - result will be Seq[Seq[String]]
.runWith(Sink.seq)
println()
Await.result(resultFuture, 1.second) # Vector(Vector(d4, d5, d6, d7), Vector(d11, d12, d13))