Scala 如何合并已排序的'Stream'或'List'中相邻的相似条目`
给予Scala 如何合并已排序的'Stream'或'List'中相邻的相似条目`,scala,functional-programming,Scala,Functional Programming,给予 大(>1000000个条目,不要期望它适合内存) 排序(写入元组的第一个值) 流状 val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream // just for demo val xs = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")) 我希
- 大(>1000000个条目,不要期望它适合内存)
- 排序(写入元组的第一个值)
val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
// just for demo
val xs = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0"))
我希望连接相邻的条目,以便转换的输出成为
List( (1, "2.5 5.0"), (2, "3.0 4.0 6.0"), (3, "6.0") )
第二个元组值将由某个幺半群函数合并(此处为字符串串联)
想法/尝试/尝试
群比
groupBy
似乎不是一个有效的选择,因为条目是在内存中的映射中收集的
扫描左
结果是
List(Joiner(0,a), Joiner(1,2.5), Joiner(1,2.5 5.0), Joiner(2,3.0))
(请忽略包装Joiner
)
但是我没有找到一种方法来消除“不完整”的条目。Emit
true
指示初始元素(当值切换时),而不是最后一个,这很容易,对吗?然后,您可以收集这些条目,然后是初始条目。
也许是这样的:
ss.scanLeft((0, "", true)) {
case ((a, str, _), (b, c)) if (str == "" || a == b) => (b, str + " " + c, false)
case (_, (b, c)) => (b, c.toString, true)
} .:+ (0, "", true)
.sliding(2)
.collect { case Seq(a, (_, _, true)) => (a._1, a._2) }
(注意,:+
thingy-它在流的末尾附加了一个“伪”条目,这样最后一个“实”元素后面也跟着一个“真”条目,并且不会被过滤掉)。这似乎没问题
def makeEm(s: Stream[(Int, String)]) = {
import Stream._
@tailrec
def z(source: Stream[(Int, String)], curr: (Int, List[String]), acc: Stream[(Int, String)]): Stream[(Int, String)] = source match {
case Empty =>
Empty
case x #:: Empty =>
acc :+ (curr._1 -> (x._2 :: curr._2).mkString(","))
case x #:: y #:: etc if x._1 != y._1 =>
val c = curr._1 -> (x._2 :: curr._2).mkString(",")
z(y #:: etc, (y._1, List[String]()), acc :+ c)
case x #:: etc =>
z(etc, (x._1, x._2 :: curr._2), acc)
}
z(s, (0, List()), Stream())
}
测试:
val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
makeEm(ss).toList.mkString(",")
val s = List().toStream
makeEm(s).toList.mkString(",")
val ss2 = List( (1, "2.5"), (1, "5.0")).toStream
makeEm(ss2).toList.mkString(",")
val s3 = List((1, "2.5"),(2, "4.0"),(3, "1.0")).toStream
makeEm(s3).toList.mkString(",")
输出
ss: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res0: String = (1,5.0,2.5),(2,6.0,4.0,3.0),(3,1.0)
s: scala.collection.immutable.Stream[Nothing] = Stream()
res1: String =
ss2: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res2: String = (1,5.0,2.5)
s3: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res3: String = (0,2.5),(2,4.0),(3,1.0)
Wrt第二种方法:我想要
列表(Joiner(1,2.5.0),Joiner(2,3.0))
。条目Joiner(1,2.5)
就是我所说的不完整。和< >代码> Joiner(0,a)< /C>只是开始点。考虑返回一个元组,例如“代码>(Cooter,BooLeIn)< /代码>,第二个元素指示这是否是“最终”条目。然后.collect{case(j,true)=>j}
@Dima:Nice try(我也想到了这一点,但我不想使用标志,而是使用第二个case类)。尽管如此,这种方法还是失败了,因为我没有找到一种方法来查看条目是否已完成。请你把代码画出来好吗。。。。
ss: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res0: String = (1,5.0,2.5),(2,6.0,4.0,3.0),(3,1.0)
s: scala.collection.immutable.Stream[Nothing] = Stream()
res1: String =
ss2: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res2: String = (1,5.0,2.5)
s3: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res3: String = (0,2.5),(2,4.0),(3,1.0)