使用有限并行性对Scala期货进行排序(而不会与执行器上下文混淆)
背景:我有一个功能:使用有限并行性对Scala期货进行排序(而不会与执行器上下文混淆),scala,future,rx-java,Scala,Future,Rx Java,背景:我有一个功能: def doWork(symbol: String): Future[Unit] 它会产生一些副作用来获取数据并存储数据,并在完成后完成未来。但是,后端基础设施有使用限制,因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表: var symbols = Array("MSFT",...) 但我想对它们进行排序,这样同时执行的代码就不会超过5个。鉴于: val allowableParallelism = 5 我当前的解决方案是(假设我使用的是a
def doWork(symbol: String): Future[Unit]
它会产生一些副作用来获取数据并存储数据,并在完成后完成未来。但是,后端基础设施有使用限制,因此并行发出的请求不超过5个。我有一个需要通过的N个符号的列表:
var symbols = Array("MSFT",...)
但我想对它们进行排序,这样同时执行的代码就不会超过5个。鉴于:
val allowableParallelism = 5
我当前的解决方案是(假设我使用的是async/await):
但是,出于显而易见的原因,我对此并不十分满意。我觉得这应该是可能的折叠,但每次我尝试,我最终热切地创造未来。我还试用了一个版本,使用concatMap,使用RxScala Observables,但这似乎也太过分了
有没有更好的方法来实现这一点?我有一个示例,说明如何使用scalaz stream实现这一点。这是相当多的代码,因为需要将scala Future转换为scalaz任务(延迟计算的抽象)。但是,需要将其添加到项目中一次。另一个选项是使用Task定义“doWork”。我个人更喜欢构建异步程序的任务
import scala.concurrent.{Future => SFuture}
import scala.util.Random
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.stream._
import scalaz.concurrent._
val P = scalaz.stream.Process
val rnd = new Random()
def doWork(symbol: String): SFuture[Unit] = SFuture {
Thread.sleep(rnd.nextInt(1000))
println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
}
val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}
implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
val actions =
process.
zipWith(f)((data, f) => f(data))
val nestedActions =
actions.map(P.eval)
merge.mergeN(concurrencyLevel)(nestedActions)
}
}
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
process.run.run
当您在范围内完成所有这些转换时,基本上您只需要:
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
虽然你已经得到了一个很好的答案,但我想我还是可以就这些问题发表一两点意见 我记得在某个地方(某人的博客上)看到“使用参与者作为状态,使用未来作为并发” 因此,我的第一个想法是以某种方式利用演员。确切地说,我会让一个主参与者和一个路由器启动多个工作参与者,工作参与者的数量根据
allowableParallelism
进行限制。所以,假设我有
def doWorkInternal (symbol: String): Unit
你的工作是什么?doWork
采取了“未来之外”的方式,我会有一些类似的东西(非常基本,没有考虑很多细节,实际上是从akka文档复制代码):
doWork
现在与您的完全一样,返回未来[Unit]
,其思想是使用
val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)
这将启动期货,根本不考虑allowableParallelism
,而是使用
val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)
考虑一些假设的具有非阻塞接口的数据库访问驱动程序,即在请求上返回未来,例如,通过在某个连接池上构建,它在并发性方面受到限制-您不希望它返回未考虑并行级别的未来,并要求您处理它们以控制并行性
这个例子更具说明性,而不是实用性,因为我通常不会期望“传出”接口会利用这样的未来(这对于“传入”接口来说是可以的)。首先,显然需要一些纯功能性的包装来包装Scala的
未来
,因为它是副作用的,并且会尽快运行。让我们称之为延迟的:
import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch
class Deferred[+T](f: () => Future[T]) {
def run(): Future[T] = f()
}
object Deferred {
def apply[T](future: => Future[T]): Deferred[T] =
new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
下面是例行公事:
import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger
import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}
trait ConcurrencyUtils {
def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
(operations: Seq[Deferred[T]])
(implicit ec: ExecutionContext): Deferred[Seq[T]] =
if (parallelism > 0) Deferred {
val indexedOps = operations.toIndexedSeq // index for faster access
val promise = Promise[Seq[T]]()
val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically
def run(operation: Deferred[T], index: Int): Unit = {
operation.run().onComplete {
case Success(value) =>
acc.add((index, value)) // accumulate result value
if (acc.size == indexedOps.size) { // we've done
import scala.collection.JavaConversions._
// in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
} else {
val next = nextIndex.getAndIncrement() // get and inc atomically
if (next < indexedOps.size) { // run next operation if exists
run(indexedOps(next), next)
}
}
case Failure(t) =>
promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
}
}
if (operations.nonEmpty) {
indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
promise.future
} else {
Future.successful(Seq.empty)
}
} else {
throw new IllegalArgumentException("Parallelism must be positive")
}
}
使用Monix任务。并行度=10的示例
val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)
// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...
我应该补充一点,如果每次将来完成时都启动一个新的,而不是等待整个队列/组完成,那就更好了。您的IO阻塞是在将来{}中包装的,还是IO异步的,在远程服务器上等待时不使用线程?如果它是阻塞的,那么一个包含5个线程的固定线程池对我来说似乎是最简单的解决方案。但是,只将该池用于IO阻塞,当然没有其他用途。支持doWork()的IO是非阻塞的,运行在我无法控制的线程上,我已将其封装到各个抽象级别的可观察对象中。谢谢,我一直在谨慎地避免使用scalaz,因为我对纯scala还是很陌生,但这看起来不错…谢谢。睡过觉后,我想我真正需要做的是把我要做的事情抽象成一个组合词,然后想出正确的名字。它实际上非常类似于Observable.concatMap,所以也许我应该坐下来思考如何表示我需要的类型。我试图做的基本上是“给定一个生成未来的惰性函数流,将一个函数的完成与下一个函数的创建链接起来,当它们都完成时返回一个未来”。考虑到类似的情况,我可以进一步推广到并发情况……如果操作Seq为空,这永远不会结束,在这种情况下,runWithBoundedParallelism应该返回未来。successful(Seq.empty)@Somatik可以随意改进答案
import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch
class Deferred[+T](f: () => Future[T]) {
def run(): Future[T] = f()
}
object Deferred {
def apply[T](future: => Future[T]): Deferred[T] =
new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger
import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}
trait ConcurrencyUtils {
def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
(operations: Seq[Deferred[T]])
(implicit ec: ExecutionContext): Deferred[Seq[T]] =
if (parallelism > 0) Deferred {
val indexedOps = operations.toIndexedSeq // index for faster access
val promise = Promise[Seq[T]]()
val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically
def run(operation: Deferred[T], index: Int): Unit = {
operation.run().onComplete {
case Success(value) =>
acc.add((index, value)) // accumulate result value
if (acc.size == indexedOps.size) { // we've done
import scala.collection.JavaConversions._
// in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
} else {
val next = nextIndex.getAndIncrement() // get and inc atomically
if (next < indexedOps.size) { // run next operation if exists
run(indexedOps(next), next)
}
}
case Failure(t) =>
promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
}
}
if (operations.nonEmpty) {
indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
promise.future
} else {
Future.successful(Seq.empty)
}
} else {
throw new IllegalArgumentException("Parallelism must be positive")
}
}
import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}
import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._
class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {
"runWithBoundedParallelism" should "return results in correct order" in {
val comp1 = mkDeferredComputation(1)
val comp2 = mkDeferredComputation(2)
val comp3 = mkDeferredComputation(3)
val comp4 = mkDeferredComputation(4)
val comp5 = mkDeferredComputation(5)
val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))
whenReady(compountComp.run()) { result =>
result should be (Seq(1, 2, 3, 4, 5))
}
}
// increase default ScalaTest patience
implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))
private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
Deferred {
Future {
Thread.sleep(sleepDuration.toMillis)
result
}
}
}
val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)
// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...