使用有限并行性对Scala期货进行排序（而不会与执行器上下文混淆）_Scala_Future_Rx Java

使用有限并行性对Scala期货进行排序（而不会与执行器上下文混淆）

scala rx-java

使用有限并行性对Scala期货进行排序（而不会与执行器上下文混淆）,scala,future,rx-java,Scala,Future,Rx Java,背景：我有一个功能： def doWork(symbol: String): Future[Unit] 它会产生一些副作用来获取数据并存储数据，并在完成后完成未来。但是，后端基础设施有使用限制，因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表： var symbols = Array("MSFT",...) 但我想对它们进行排序，这样同时执行的代码就不会超过5个。鉴于： val allowableParallelism = 5 我当前的解决方案是（假设我使用的是a

背景：我有一个功能：

  def doWork(symbol: String): Future[Unit]

它会产生一些副作用来获取数据并存储数据，并在完成后完成未来。但是，后端基础设施有使用限制，因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表：

  var symbols = Array("MSFT",...)

但我想对它们进行排序，这样同时执行的代码就不会超过5个。鉴于：

  val allowableParallelism = 5

我当前的解决方案是（假设我使用的是async/await）：

但是，出于显而易见的原因，我对此并不十分满意。我觉得这应该是可能的折叠，但每次我尝试，我最终热切地创造未来。我还试用了一个版本，使用concatMap，使用RxScala Observables，但这似乎也太过分了

有没有更好的方法来实现这一点？

我有一个示例，说明如何使用scalaz stream实现这一点。这是相当多的代码，因为需要将scala Future转换为scalaz任务（延迟计算的抽象）。但是，需要将其添加到项目中一次。另一个选项是使用Task定义“doWork”。我个人更喜欢构建异步程序的任务

  import scala.concurrent.{Future => SFuture}
  import scala.util.Random
  import scala.concurrent.ExecutionContext.Implicits.global


  import scalaz.stream._
  import scalaz.concurrent._

  val P = scalaz.stream.Process

  val rnd = new Random()

  def doWork(symbol: String): SFuture[Unit] = SFuture {
    Thread.sleep(rnd.nextInt(1000))
    println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
  }

  val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
    flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))

  implicit class Transformer[+T](fut: => SFuture[T]) {
    def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
      import scala.util.{Failure, Success}
      import scalaz.syntax.either._
      Task.async {
        register =>
          fut.onComplete {
            case Success(v) => register(v.right)
            case Failure(ex) => register(ex.left)
          }
      }
    }
  }

  implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
    def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
      val actions =
        process.
          zipWith(f)((data, f) => f(data))

      val nestedActions =
        actions.map(P.eval)

      merge.mergeN(concurrencyLevel)(nestedActions)
    }
  }

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

  process.run.run

当您在范围内完成所有这些转换时，基本上您只需要：

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

虽然你已经得到了一个很好的答案，但我想我还是可以就这些问题发表一两点意见

我记得在某个地方（某人的博客上）看到“使用参与者作为状态，使用未来作为并发”

因此，我的第一个想法是以某种方式利用演员。确切地说，我会让一个主参与者和一个路由器启动多个工作参与者，工作参与者的数量根据

allowableParallelism

进行限制。所以，假设我有

def doWorkInternal (symbol: String): Unit

你的工作是什么？

doWork

采取了“未来之外”的方式，我会有一些类似的东西（非常基本，没有考虑很多细节，实际上是从akka文档复制代码）：

doWork

现在与您的完全一样，返回

未来[Unit]

，其思想是使用

val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)

这将启动期货，根本不考虑

allowableParallelism

，而是使用

val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)

考虑一些假设的具有非阻塞接口的数据库访问驱动程序，即在请求上返回未来，例如，通过在某个连接池上构建，它在并发性方面受到限制-您不希望它返回未考虑并行级别的未来，并要求您处理它们以控制并行性

这个例子更具说明性，而不是实用性，因为我通常不会期望“传出”接口会利用这样的未来（这对于“传入”接口来说是可以的）。

首先，显然需要一些纯功能性的包装来包装Scala的

未来

，因为它是副作用的，并且会尽快运行。让我们称之为延迟的：

import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch

class Deferred[+T](f: () => Future[T]) {
  def run(): Future[T] = f()
}

object Deferred {
  def apply[T](future: => Future[T]): Deferred[T] =
    new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}

下面是例行公事：

import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger

import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}

trait ConcurrencyUtils {    
  def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
                                  (operations: Seq[Deferred[T]])
                                  (implicit ec: ExecutionContext): Deferred[Seq[T]] =
    if (parallelism > 0) Deferred {
      val indexedOps = operations.toIndexedSeq // index for faster access

      val promise = Promise[Seq[T]]()

      val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
      val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically

      def run(operation: Deferred[T], index: Int): Unit = {
        operation.run().onComplete {
          case Success(value) =>
            acc.add((index, value)) // accumulate result value

            if (acc.size == indexedOps.size) { // we've done
              import scala.collection.JavaConversions._
              // in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
              promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
            } else {
              val next = nextIndex.getAndIncrement() // get and inc atomically
              if (next < indexedOps.size) { // run next operation if exists
                run(indexedOps(next), next)
              }
            }
          case Failure(t) =>
            promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
        }
      }

      if (operations.nonEmpty) {
        indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
        promise.future
      } else {
        Future.successful(Seq.empty)
      }
    } else {
      throw new IllegalArgumentException("Parallelism must be positive")
    }
}

使用Monix任务。并行度=10的示例

val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)

// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...

我应该补充一点，如果每次将来完成时都启动一个新的，而不是等待整个队列/组完成，那就更好了。您的IO阻塞是在将来{}中包装的，还是IO异步的，在远程服务器上等待时不使用线程？如果它是阻塞的，那么一个包含5个线程的固定线程池对我来说似乎是最简单的解决方案。但是，只将该池用于IO阻塞，当然没有其他用途。支持doWork（）的IO是非阻塞的，运行在我无法控制的线程上，我已将其封装到各个抽象级别的可观察对象中。谢谢，我一直在谨慎地避免使用scalaz，因为我对纯scala还是很陌生，但这看起来不错…谢谢。睡过觉后，我想我真正需要做的是把我要做的事情抽象成一个组合词，然后想出正确的名字。它实际上非常类似于Observable.concatMap，所以也许我应该坐下来思考如何表示我需要的类型。我试图做的基本上是“给定一个生成未来的惰性函数流，将一个函数的完成与下一个函数的创建链接起来，当它们都完成时返回一个未来”。考虑到类似的情况，我可以进一步推广到并发情况……如果操作Seq为空，这永远不会结束，在这种情况下，runWithBoundedParallelism应该返回未来。successful（Seq.empty）@Somatik可以随意改进答案

import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch

class Deferred[+T](f: () => Future[T]) {
  def run(): Future[T] = f()
}

object Deferred {
  def apply[T](future: => Future[T]): Deferred[T] =
    new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}

import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger

import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}

trait ConcurrencyUtils {    
  def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
                                  (operations: Seq[Deferred[T]])
                                  (implicit ec: ExecutionContext): Deferred[Seq[T]] =
    if (parallelism > 0) Deferred {
      val indexedOps = operations.toIndexedSeq // index for faster access

      val promise = Promise[Seq[T]]()

      val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
      val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically

      def run(operation: Deferred[T], index: Int): Unit = {
        operation.run().onComplete {
          case Success(value) =>
            acc.add((index, value)) // accumulate result value

            if (acc.size == indexedOps.size) { // we've done
              import scala.collection.JavaConversions._
              // in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
              promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
            } else {
              val next = nextIndex.getAndIncrement() // get and inc atomically
              if (next < indexedOps.size) { // run next operation if exists
                run(indexedOps(next), next)
              }
            }
          case Failure(t) =>
            promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
        }
      }

      if (operations.nonEmpty) {
        indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
        promise.future
      } else {
        Future.successful(Seq.empty)
      }
    } else {
      throw new IllegalArgumentException("Parallelism must be positive")
    }
}

import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}

import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._

class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {

  "runWithBoundedParallelism" should "return results in correct order" in {
    val comp1 = mkDeferredComputation(1)
    val comp2 = mkDeferredComputation(2)
    val comp3 = mkDeferredComputation(3)
    val comp4 = mkDeferredComputation(4)
    val comp5 = mkDeferredComputation(5)

    val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))

    whenReady(compountComp.run()) { result =>
      result should be (Seq(1, 2, 3, 4, 5))
    }
  }

  // increase default ScalaTest patience
  implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))

  private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
    Deferred {
      Future {
        Thread.sleep(sleepDuration.toMillis)
        result
      }
    }

}

val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)

// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...