在批处理中异步迭代数据源,并在远程在Scala中不返回任何数据时停止

在批处理中异步迭代数据源,并在远程在Scala中不返回任何数据时停止,scala,Scala,假设我们有一个假数据源,它将返回它批量保存的数据 class DataSource(size: Int) { private var s = 0 implicit val g = scala.concurrent.ExecutionContext.global def getData(): Future[List[Int]] = { s = s + 1 Future { Thread.sleep(Random.nextInt

假设我们有一个假数据源,它将返回它批量保存的数据

class DataSource(size: Int) {
    private var s = 0
    implicit val g = scala.concurrent.ExecutionContext.global
    def getData(): Future[List[Int]] = {
        s = s + 1
        Future {
        Thread.sleep(Random.nextInt(s * 100))
        if (s <= size) {
            List.fill(100)(s)
        } else {
            List()
        }
    }

}
object Test extends App {
    val source = new DataSource(100)
    implicit val g = scala.concurrent.ExecutionContext.global

    def process(v: List[Int]): Unit = {
        println(v)
    }

    def next(f: (List[Int]) => Unit): Unit = {
        val fut = source.getData()
        fut.onComplete {
            case Success(v) => {
                f(v)
                v match {
                    case h :: t => next(f)
                }
            }
        }
    }

    next(process)

    Thread.sleep(1000000000)
}
然而,有人能比较一下Akka Stream和Play Iteratee吗?值得我也尝试一下Iteratee吗


代码剪1:

Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
    .mapAsync(1)(identity) // line 2
    .takeWhile(_.nonEmpty) // line 3
    .runForeach(println)   // line 4
代码剪贴画2:假设getData依赖于另一个流的其他输出,我想用下面的流来描述它。但是,它会产生太多的文件打开错误。不确定什么会导致此错误,如果我理解正确,mapAsync的吞吐量限制为1

Flow[Int].mapConcat[Future[List[Int]]](c => {
  Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
理想情况下,我想把每个批次的未来包装成一个大的未来,当最后一个批次返回0大小的列表时,包装器未来是否成功

我想你在寻找一个
承诺

在开始第一次迭代之前,您应该设置一个
承诺

这为您提供了
promise.future
,一个
future
,您可以使用它来跟踪所有事情的完成

在您的
onComplete
中,您添加了一个
案例\u=>promise.success()

差不多

def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
  val promise = Promise[Unit]

  def next(): Unit = source.getData().onComplete {
        case Success(v) => 
            f(v)
            v match {
                case h :: t => next()
                case _ => promise.success()
            }      
        case Failure(e) => promise.failure(e)
  }


  // get going
  next(f)

  // return the Future for everything
  promise.future
}


// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)

您可能正在寻找一个反应流库。我个人最喜欢的(也是我最熟悉的)是。这就是它在数据源不变的情况下的工作方式

import scala.concurrent.duration.Duration
import scala.concurrent.Await

import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global

object Test extends App {
    val source = new DataSource(100)
    val completed = // <- this is Future[Unit], completes when foreach is done
        Observable.repeat(Observable.fromFuture(source.getData()))
            .flatten // <- Here it's Observable[List[Int]], it has collection-like methods
            .takeWhile(_.nonEmpty)
            .foreach(println)

    Await.result(completed, Duration.Inf)
}
导入scala.concurrent.duration.duration
导入scala.concurrent.Await
导入monix.reactive.Observable
导入monix.execution.Scheduler.Implicits.global
对象测试扩展应用程序{
val源=新数据源(100)

val completed=/以下是使用
数据源
类实现Akka Streams相同行为的一种方法:

import scala.concurrent.Future
import scala.util.Random

import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._

object StreamsExample extends App {
  implicit val system = ActorSystem("Sandbox")
  implicit val materializer = ActorMaterializer()

  val ds = new DataSource(100)

  Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
        .mapAsync(1)(identity) // line 2
        .takeWhile(_.nonEmpty) // line 3
        .runForeach(println)   // line 4
}

class DataSource(size: Int) {
  ...
}
简化的逐行概述:

  • 第1行
    :创建一个流源,如果有下游需求,它将持续调用
    ds.getData
  • 第2行
    mapAsync
    是一种处理未来的流元素的方法。在这种情况下,流元素的类型为
    Future[List[Int]]
    。参数
    1
    是并行级别:我们在这里指定
    1
    ,因为
    DataSource
    在内部使用可变变量,并且并行级别大于1可能会产生意外的结果。
    identity
    x=>x
    的缩写,这基本上意味着对于每个
    未来,我们将其结果向下传递,而不转换它
  • 第3行
    :本质上,只要
    未来
    的结果是非空的
    列表[Int]
    ,就会调用
    ds.getData
    。如果遇到空的
    列表
    ,处理就会终止
  • 第4行
    runForeach
    这里获取一个函数
    List[Int]=>单元
    ,并为每个流元素调用该函数

我刚刚发现,使用flatMapConcat可以实现我想要实现的目标。没有必要再问一个问题,因为我已经有了答案。请将我的示例代码放在这里,以防有人在寻找类似的答案

这种类型的API在传统企业应用程序之间的某些集成中非常常见。数据源用于模拟API,而对象应用程序用于演示客户端代码如何利用Akka Stream来使用API

在我的小项目中,API是在SOAP中提供的,我用来将SOAP转换为Scala异步样式。通过对象应用程序中演示的客户端调用,我们可以使用AKKA Stream使用API。感谢所有人的帮助

class DataSource(size: Int) {
    private var transactionId: Long = 0
    private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
    private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
    implicit val g = scala.concurrent.ExecutionContext.global

    case class TransactionId(id: Long)

    case class ReadCursorId(id: Long)

    def startTransaction(): Future[TransactionId] = {
        Future {
            synchronized {
                transactionId += transactionId
            }
            val t = TransactionId(transactionId)
            transactionCursorMap.update(t, Set(ReadCursorId(0)))
            t
        }
    }

    def createCursorId(t: TransactionId): ReadCursorId = {
        synchronized {
            val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
            val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
            val cId = ReadCursorId(currentId + 1)
            transactionCursorMap.update(t, c + cId)
            cursorIteratorMap.put(cId, createIterator)
            cId
        }
    }

    def createIterator(): Iterator[List[Int]] = {
        (for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
    }

    def startRead(t: TransactionId): Future[ReadCursorId] = {
        Future {

            createCursorId(t)
        }
    }

    def getData(cursorId: ReadCursorId): Future[List[Int]] = {

        synchronized {
            Future {
                Thread.sleep(Random.nextInt(100))
                cursorIteratorMap.get(cursorId) match {
                    case Some(i) => i.next()
                    case _ => List()
                }
            }
        }
    }


}


object Test extends App {
    val source = new DataSource(10)
    implicit val system = ActorSystem("Sandbox")
    implicit val materializer = ActorMaterializer()
    implicit val g = scala.concurrent.ExecutionContext.global
    //
    //  def process(v: List[Int]): Unit = {
    //    println(v)
    //  }
    //
    //  def next(f: (List[Int]) => Unit): Unit = {
    //    val fut = source.getData()
    //    fut.onComplete {
    //      case Success(v) => {
    //        f(v)
    //        v match {
    //
    //          case h :: t => next(f)
    //
    //        }
    //      }
    //
    //    }
    //
    //  }
    //
    //  next(process)
    //
    //  Thread.sleep(1000000000)

    val s = Source.fromFuture(source.startTransaction())
      .map { e =>
          source.startRead(e)
      }
      .mapAsync(1)(identity)
      .flatMapConcat(
          e => {
              Source.fromIterator(() => Iterator.continually(source.getData(e)))
          })
      .mapAsync(5)(identity)
      .via(Flow[List[Int]].takeWhile(_.nonEmpty))
      .runForeach(println)


    /*
      val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
        .via(Flow[List[Int]].takeWhile(_.nonEmpty))
        .runFold(List[List[Int]]()) { (acc, r) =>
          //      println("=======" + acc + r)
          r :: acc
        }

      done.onSuccess {

        case e => {
          e.foreach(println)
        }

      }
      done.onComplete(_ => system.terminate())
    */
}
类数据源(大小:Int){ 私有变量transactionId:Long=0 私有val transactionCursorMap:mutable.HashMap[TransactionId,Set[ReadCursorId]]=mutable.HashMap.empty private val cursorIteratorMap:mutable.HashMap[ReadCursorId,Iterator[List[Int]]]=mutable.HashMap.empty 隐式val g=scala.concurrent.ExecutionContext.global 案例类事务id(id:Long) 案例类ReadCursorId(id:Long) def startTransaction():未来[TransactionId]={ 未来{ 同步的{ transactionId+=transactionId } val t=TransactionId(TransactionId) transactionCursorMap.update(t,Set(ReadCursorId(0))) T } } def createCursorId(t:TransactionId):ReadCursorId={ 同步的{ val c=transactionCursorMap.GetOrelsUpdate(t,Set(ReadCursorId(0))) val currentId=c.foldLeft(0l){(acc,a)=>acc.max(a.id)} val cId=ReadCursorId(当前ID+1) transactionCursorMap.update(t、c+cId) cursorIteratorMap.put(cId、createIterator) cId } } def createIterator():迭代器[List[Int]={ (对于{i.next() 案例=>List() } } } } } 对象测试扩展应用程序{ val源=新数据源(10) 隐式val系统=ActorSystem(“沙盒”) 隐式val-materializer=actormatarializer() 隐式val g=scala.concurrent.ExecutionContext.global // //def过程(v:List[Int]):单位={ //println(v) // } // //def next(f:(列表[Int])=>单位:单位={ //val fut=source.getData() //未来完成{ //案例成功率(v)=>{ //f(v) //v匹配{ // //案例h::t=>next(f) // // } // } // // } // // } // //下一步(流程) // //线程睡眠(100000000) val s=Source.fromFuture(Source.startTransaction()) .map{e=> 资料来源:startRead(e) } .mapsync(1)(标识) flatMapConcat先生( e=>{ Source.fromterator(()=>Iterator.continuously(Source.getData(e))) }) .
class DataSource(size: Int) {
    private var transactionId: Long = 0
    private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
    private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
    implicit val g = scala.concurrent.ExecutionContext.global

    case class TransactionId(id: Long)

    case class ReadCursorId(id: Long)

    def startTransaction(): Future[TransactionId] = {
        Future {
            synchronized {
                transactionId += transactionId
            }
            val t = TransactionId(transactionId)
            transactionCursorMap.update(t, Set(ReadCursorId(0)))
            t
        }
    }

    def createCursorId(t: TransactionId): ReadCursorId = {
        synchronized {
            val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
            val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
            val cId = ReadCursorId(currentId + 1)
            transactionCursorMap.update(t, c + cId)
            cursorIteratorMap.put(cId, createIterator)
            cId
        }
    }

    def createIterator(): Iterator[List[Int]] = {
        (for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
    }

    def startRead(t: TransactionId): Future[ReadCursorId] = {
        Future {

            createCursorId(t)
        }
    }

    def getData(cursorId: ReadCursorId): Future[List[Int]] = {

        synchronized {
            Future {
                Thread.sleep(Random.nextInt(100))
                cursorIteratorMap.get(cursorId) match {
                    case Some(i) => i.next()
                    case _ => List()
                }
            }
        }
    }


}


object Test extends App {
    val source = new DataSource(10)
    implicit val system = ActorSystem("Sandbox")
    implicit val materializer = ActorMaterializer()
    implicit val g = scala.concurrent.ExecutionContext.global
    //
    //  def process(v: List[Int]): Unit = {
    //    println(v)
    //  }
    //
    //  def next(f: (List[Int]) => Unit): Unit = {
    //    val fut = source.getData()
    //    fut.onComplete {
    //      case Success(v) => {
    //        f(v)
    //        v match {
    //
    //          case h :: t => next(f)
    //
    //        }
    //      }
    //
    //    }
    //
    //  }
    //
    //  next(process)
    //
    //  Thread.sleep(1000000000)

    val s = Source.fromFuture(source.startTransaction())
      .map { e =>
          source.startRead(e)
      }
      .mapAsync(1)(identity)
      .flatMapConcat(
          e => {
              Source.fromIterator(() => Iterator.continually(source.getData(e)))
          })
      .mapAsync(5)(identity)
      .via(Flow[List[Int]].takeWhile(_.nonEmpty))
      .runForeach(println)


    /*
      val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
        .via(Flow[List[Int]].takeWhile(_.nonEmpty))
        .runFold(List[List[Int]]()) { (acc, r) =>
          //      println("=======" + acc + r)
          r :: acc
        }

      done.onSuccess {

        case e => {
          e.foreach(println)
        }

      }
      done.onComplete(_ => system.terminate())
    */
}