在批处理中异步迭代数据源,并在远程在Scala中不返回任何数据时停止
假设我们有一个假数据源,它将返回它批量保存的数据在批处理中异步迭代数据源,并在远程在Scala中不返回任何数据时停止,scala,Scala,假设我们有一个假数据源,它将返回它批量保存的数据 class DataSource(size: Int) { private var s = 0 implicit val g = scala.concurrent.ExecutionContext.global def getData(): Future[List[Int]] = { s = s + 1 Future { Thread.sleep(Random.nextInt
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
然而,有人能比较一下Akka Stream和Play Iteratee吗?值得我也尝试一下Iteratee吗
代码剪1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
代码剪贴画2:假设getData依赖于另一个流的其他输出,我想用下面的流来描述它。但是,它会产生太多的文件打开错误。不确定什么会导致此错误,如果我理解正确,mapAsync的吞吐量限制为1
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
理想情况下,我想把每个批次的未来包装成一个大的未来,当最后一个批次返回0大小的列表时,包装器未来是否成功
我想你在寻找一个承诺
在开始第一次迭代之前,您应该设置一个承诺
这为您提供了promise.future
,一个future
,您可以使用它来跟踪所有事情的完成
在您的onComplete
中,您添加了一个案例\u=>promise.success()
差不多
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
您可能正在寻找一个反应流库。我个人最喜欢的(也是我最熟悉的)是。这就是它在数据源不变的情况下的工作方式
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
导入scala.concurrent.duration.duration
导入scala.concurrent.Await
导入monix.reactive.Observable
导入monix.execution.Scheduler.Implicits.global
对象测试扩展应用程序{
val源=新数据源(100)
val completed=/以下是使用数据源
类实现Akka Streams相同行为的一种方法:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
简化的逐行概述:
第1行
:创建一个流源,如果有下游需求,它将持续调用ds.getData
第2行
:mapAsync
是一种处理未来的流元素的方法。在这种情况下,流元素的类型为Future[List[Int]]
。参数1
是并行级别:我们在这里指定1
,因为DataSource
在内部使用可变变量,并且并行级别大于1可能会产生意外的结果。identity
是x=>x
的缩写,这基本上意味着对于每个未来代码>,我们将其结果向下传递,而不转换它
第3行
:本质上,只要未来
的结果是非空的列表[Int]
,就会调用ds.getData
。如果遇到空的列表
,处理就会终止
第4行
:runForeach
这里获取一个函数List[Int]=>单元
,并为每个流元素调用该函数
我刚刚发现,使用flatMapConcat可以实现我想要实现的目标。没有必要再问一个问题,因为我已经有了答案。请将我的示例代码放在这里,以防有人在寻找类似的答案
这种类型的API在传统企业应用程序之间的某些集成中非常常见。数据源用于模拟API,而对象应用程序用于演示客户端代码如何利用Akka Stream来使用API
在我的小项目中,API是在SOAP中提供的,我用来将SOAP转换为Scala异步样式。通过对象应用程序中演示的客户端调用,我们可以使用AKKA Stream使用API。感谢所有人的帮助
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}
类数据源(大小:Int){
私有变量transactionId:Long=0
私有val transactionCursorMap:mutable.HashMap[TransactionId,Set[ReadCursorId]]=mutable.HashMap.empty
private val cursorIteratorMap:mutable.HashMap[ReadCursorId,Iterator[List[Int]]]=mutable.HashMap.empty
隐式val g=scala.concurrent.ExecutionContext.global
案例类事务id(id:Long)
案例类ReadCursorId(id:Long)
def startTransaction():未来[TransactionId]={
未来{
同步的{
transactionId+=transactionId
}
val t=TransactionId(TransactionId)
transactionCursorMap.update(t,Set(ReadCursorId(0)))
T
}
}
def createCursorId(t:TransactionId):ReadCursorId={
同步的{
val c=transactionCursorMap.GetOrelsUpdate(t,Set(ReadCursorId(0)))
val currentId=c.foldLeft(0l){(acc,a)=>acc.max(a.id)}
val cId=ReadCursorId(当前ID+1)
transactionCursorMap.update(t、c+cId)
cursorIteratorMap.put(cId、createIterator)
cId
}
}
def createIterator():迭代器[List[Int]={
(对于{i.next()
案例=>List()
}
}
}
}
}
对象测试扩展应用程序{
val源=新数据源(10)
隐式val系统=ActorSystem(“沙盒”)
隐式val-materializer=actormatarializer()
隐式val g=scala.concurrent.ExecutionContext.global
//
//def过程(v:List[Int]):单位={
//println(v)
// }
//
//def next(f:(列表[Int])=>单位:单位={
//val fut=source.getData()
//未来完成{
//案例成功率(v)=>{
//f(v)
//v匹配{
//
//案例h::t=>next(f)
//
// }
// }
//
// }
//
// }
//
//下一步(流程)
//
//线程睡眠(100000000)
val s=Source.fromFuture(Source.startTransaction())
.map{e=>
资料来源:startRead(e)
}
.mapsync(1)(标识)
flatMapConcat先生(
e=>{
Source.fromterator(()=>Iterator.continuously(Source.getData(e)))
})
.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}