Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用有限并行性对Scala期货进行排序(而不会与执行器上下文混淆)_Scala_Future_Rx Java - Fatal编程技术网

使用有限并行性对Scala期货进行排序(而不会与执行器上下文混淆)

使用有限并行性对Scala期货进行排序(而不会与执行器上下文混淆),scala,future,rx-java,Scala,Future,Rx Java,背景:我有一个功能: def doWork(symbol: String): Future[Unit] 它会产生一些副作用来获取数据并存储数据,并在完成后完成未来。但是,后端基础设施有使用限制,因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表: var symbols = Array("MSFT",...) 但我想对它们进行排序,这样同时执行的代码就不会超过5个。鉴于: val allowableParallelism = 5 我当前的解决方案是(假设我使用的是a

背景:我有一个功能:

  def doWork(symbol: String): Future[Unit]
它会产生一些副作用来获取数据并存储数据,并在完成后完成未来。但是,后端基础设施有使用限制,因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表:

  var symbols = Array("MSFT",...)
但我想对它们进行排序,这样同时执行的代码就不会超过5个。鉴于:

  val allowableParallelism = 5
我当前的解决方案是(假设我使用的是async/await):

但是,出于显而易见的原因,我对此并不十分满意。我觉得这应该是可能的折叠,但每次我尝试,我最终热切地创造未来。我还试用了一个版本,使用concatMap,使用RxScala Observables,但这似乎也太过分了


有没有更好的方法来实现这一点?

我有一个示例,说明如何使用scalaz stream实现这一点。这是相当多的代码,因为需要将scala Future转换为scalaz任务(延迟计算的抽象)。但是,需要将其添加到项目中一次。另一个选项是使用Task定义“doWork”。我个人更喜欢构建异步程序的任务

  import scala.concurrent.{Future => SFuture}
  import scala.util.Random
  import scala.concurrent.ExecutionContext.Implicits.global


  import scalaz.stream._
  import scalaz.concurrent._

  val P = scalaz.stream.Process

  val rnd = new Random()

  def doWork(symbol: String): SFuture[Unit] = SFuture {
    Thread.sleep(rnd.nextInt(1000))
    println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
  }

  val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
    flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))

  implicit class Transformer[+T](fut: => SFuture[T]) {
    def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
      import scala.util.{Failure, Success}
      import scalaz.syntax.either._
      Task.async {
        register =>
          fut.onComplete {
            case Success(v) => register(v.right)
            case Failure(ex) => register(ex.left)
          }
      }
    }
  }

  implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
    def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
      val actions =
        process.
          zipWith(f)((data, f) => f(data))

      val nestedActions =
        actions.map(P.eval)

      merge.mergeN(concurrencyLevel)(nestedActions)
    }
  }

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

  process.run.run
当您在范围内完成所有这些转换时,基本上您只需要:

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

虽然你已经得到了一个很好的答案,但我想我还是可以就这些问题发表一两点意见

我记得在某个地方(某人的博客上)看到“使用参与者作为状态,使用未来作为并发”

因此,我的第一个想法是以某种方式利用演员。确切地说,我会让一个主参与者和一个路由器启动多个工作参与者,工作参与者的数量根据
allowableParallelism
进行限制。所以,假设我有

def doWorkInternal (symbol: String): Unit
你的工作是什么?
doWork
采取了“未来之外”的方式,我会有一些类似的东西(非常基本,没有考虑很多细节,实际上是从akka文档复制代码):

doWork
现在与您的完全一样,返回
未来[Unit]
,其思想是使用

val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)
这将启动期货,根本不考虑
allowableParallelism
,而是使用

val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)
考虑一些假设的具有非阻塞接口的数据库访问驱动程序,即在请求上返回未来,例如,通过在某个连接池上构建,它在并发性方面受到限制-您不希望它返回未考虑并行级别的未来,并要求您处理它们以控制并行性


这个例子更具说明性,而不是实用性,因为我通常不会期望“传出”接口会利用这样的未来(这对于“传入”接口来说是可以的)。

首先,显然需要一些纯功能性的包装来包装Scala的
未来
,因为它是副作用的,并且会尽快运行。让我们称之为延迟的:

import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch

class Deferred[+T](f: () => Future[T]) {
  def run(): Future[T] = f()
}

object Deferred {
  def apply[T](future: => Future[T]): Deferred[T] =
    new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
下面是例行公事:

import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger

import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}

trait ConcurrencyUtils {    
  def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
                                  (operations: Seq[Deferred[T]])
                                  (implicit ec: ExecutionContext): Deferred[Seq[T]] =
    if (parallelism > 0) Deferred {
      val indexedOps = operations.toIndexedSeq // index for faster access

      val promise = Promise[Seq[T]]()

      val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
      val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically

      def run(operation: Deferred[T], index: Int): Unit = {
        operation.run().onComplete {
          case Success(value) =>
            acc.add((index, value)) // accumulate result value

            if (acc.size == indexedOps.size) { // we've done
              import scala.collection.JavaConversions._
              // in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
              promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
            } else {
              val next = nextIndex.getAndIncrement() // get and inc atomically
              if (next < indexedOps.size) { // run next operation if exists
                run(indexedOps(next), next)
              }
            }
          case Failure(t) =>
            promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
        }
      }

      if (operations.nonEmpty) {
        indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
        promise.future
      } else {
        Future.successful(Seq.empty)
      }
    } else {
      throw new IllegalArgumentException("Parallelism must be positive")
    }
}

使用Monix任务。并行度=10的示例

val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)

// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...

我应该补充一点,如果每次将来完成时都启动一个新的,而不是等待整个队列/组完成,那就更好了。您的IO阻塞是在将来{}中包装的,还是IO异步的,在远程服务器上等待时不使用线程?如果它是阻塞的,那么一个包含5个线程的固定线程池对我来说似乎是最简单的解决方案。但是,只将该池用于IO阻塞,当然没有其他用途。支持doWork()的IO是非阻塞的,运行在我无法控制的线程上,我已将其封装到各个抽象级别的可观察对象中。谢谢,我一直在谨慎地避免使用scalaz,因为我对纯scala还是很陌生,但这看起来不错…谢谢。睡过觉后,我想我真正需要做的是把我要做的事情抽象成一个组合词,然后想出正确的名字。它实际上非常类似于Observable.concatMap,所以也许我应该坐下来思考如何表示我需要的类型。我试图做的基本上是“给定一个生成未来的惰性函数流,将一个函数的完成与下一个函数的创建链接起来,当它们都完成时返回一个未来”。考虑到类似的情况,我可以进一步推广到并发情况……如果操作Seq为空,这永远不会结束,在这种情况下,runWithBoundedParallelism应该返回未来。successful(Seq.empty)@Somatik可以随意改进答案
import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch

class Deferred[+T](f: () => Future[T]) {
  def run(): Future[T] = f()
}

object Deferred {
  def apply[T](future: => Future[T]): Deferred[T] =
    new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger

import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}

trait ConcurrencyUtils {    
  def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
                                  (operations: Seq[Deferred[T]])
                                  (implicit ec: ExecutionContext): Deferred[Seq[T]] =
    if (parallelism > 0) Deferred {
      val indexedOps = operations.toIndexedSeq // index for faster access

      val promise = Promise[Seq[T]]()

      val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
      val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically

      def run(operation: Deferred[T], index: Int): Unit = {
        operation.run().onComplete {
          case Success(value) =>
            acc.add((index, value)) // accumulate result value

            if (acc.size == indexedOps.size) { // we've done
              import scala.collection.JavaConversions._
              // in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
              promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
            } else {
              val next = nextIndex.getAndIncrement() // get and inc atomically
              if (next < indexedOps.size) { // run next operation if exists
                run(indexedOps(next), next)
              }
            }
          case Failure(t) =>
            promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
        }
      }

      if (operations.nonEmpty) {
        indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
        promise.future
      } else {
        Future.successful(Seq.empty)
      }
    } else {
      throw new IllegalArgumentException("Parallelism must be positive")
    }
}
import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}

import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._

class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {

  "runWithBoundedParallelism" should "return results in correct order" in {
    val comp1 = mkDeferredComputation(1)
    val comp2 = mkDeferredComputation(2)
    val comp3 = mkDeferredComputation(3)
    val comp4 = mkDeferredComputation(4)
    val comp5 = mkDeferredComputation(5)

    val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))

    whenReady(compountComp.run()) { result =>
      result should be (Seq(1, 2, 3, 4, 5))
    }
  }

  // increase default ScalaTest patience
  implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))

  private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
    Deferred {
      Future {
        Thread.sleep(sleepDuration.toMillis)
        result
      }
    }

}
val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)

// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...