scala中的滚动时间窗口数据

scala中的滚动时间窗口数据,scala,aggregate,sliding-window,Scala,Aggregate,Sliding Window,请查找以下简化的scala代码段,该代码段生成随机的日->数据映射,并尝试计算3天的滚动时间窗口数据:- val dataByDay: Map[String, String] = TreeMap((1 to 7).map(i => (s"Day$i" -> s"Data-$i")): _*) val groupedIterator: Iterator[(Int, Map[String, String])] = dataByDay.sliding(3).zipWithIndex.ma

请查找以下简化的scala代码段,该代码段生成随机的日->数据映射,并尝试计算3天的滚动时间窗口数据:-

val dataByDay: Map[String, String] = TreeMap((1 to 7).map(i => (s"Day$i" -> s"Data-$i")): _*)

val groupedIterator: Iterator[(Int, Map[String, String])] = dataByDay.sliding(3).zipWithIndex.map(e => ((e._2 + 1) -> e._1))

for ((day, lastFiveDaysDataOnEveryDay) <- groupedIterator) {
  println(s"On Day${day} data for days " + lastFiveDaysDataOnEveryDay.keys.mkString(",") + " will be used")
}
要求处理如下所示的数据:-

On Day1 data for days will be used
On Day2 data for days Day1 will be used
On Day3 data for days Day1,Day2 will be used
On Day4 data for days Day1,Day2,Day3 will be used
On Day5 data for days Day2,Day3,Day4 will be used
On Day6 data for days Day3,Day4,Day5 will be used
On Day7 data for days Day4,Day5,Day6 will be used

请建议。

您的要求有点模糊。如果您只需要这个输出,那么一个简单的解决方案就是这样的

(1 to 7).foreach { day =>
  val prior = Seq(day-3,day-2,day-1).filter(_>0).map("Day" + _)
  println(s"On Day$day data for days${prior.mkString(",")} will be used")
}

如果需求是可配置滚动窗口的数据表示,则需要更精确的信息。

您的需求有点模糊。如果您只需要这个输出,那么一个简单的解决方案就是这样的

(1 to 7).foreach { day =>
  val prior = Seq(day-3,day-2,day-1).filter(_>0).map("Day" + _)
  println(s"On Day$day data for days${prior.mkString(",")} will be used")
}

如果需求是可配置滚动窗口的数据表示,则需要更精确的信息。

我假设此代码仅用于此问题,而您的实际需求有所不同

我为streams提供了一个解决方案,您可以使用类似于以下内容的方法来为您的用例获得这个特殊的窗口实现

import scala.collection.mutable

val stream = {
  def loop(i: Int): Stream[(String, String)] = (s"Day$i", s"Data$i") #:: loop(i + 1)
  loop(1)
}

def specialWindowedStream[T](source: Stream[T], window: Int): Stream[List[T]] = {
  val queue = mutable.Queue.empty[T]
  def loop(source: Stream[T]): Stream[List[T]] = {
    queue.enqueue(source.head)
    if (queue.size > window) {
      queue.dequeue()
    }
    queue.toList #:: loop(source.tail)
  }

  loop(source)
}

val windowedStream = specialWindowedStream(stream, 5)

windowedStream.zipWithIndex.take(6).foreach(println)
// (List((Day1,Data1)),0)
// (List((Day1,Data1), (Day2,Data2)),1)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3)),2)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4)),3)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4),(Day5,Data5)),4)
// (List((Day2,Data2), (Day3,Data3), (Day4,Data4), (Day5,Data5),(Day6,Data6)),5)

我假设这段代码只是为了解决这个问题,而您的实际需求有点不同

我为streams提供了一个解决方案,您可以使用类似于以下内容的方法来为您的用例获得这个特殊的窗口实现

import scala.collection.mutable

val stream = {
  def loop(i: Int): Stream[(String, String)] = (s"Day$i", s"Data$i") #:: loop(i + 1)
  loop(1)
}

def specialWindowedStream[T](source: Stream[T], window: Int): Stream[List[T]] = {
  val queue = mutable.Queue.empty[T]
  def loop(source: Stream[T]): Stream[List[T]] = {
    queue.enqueue(source.head)
    if (queue.size > window) {
      queue.dequeue()
    }
    queue.toList #:: loop(source.tail)
  }

  loop(source)
}

val windowedStream = specialWindowedStream(stream, 5)

windowedStream.zipWithIndex.take(6).foreach(println)
// (List((Day1,Data1)),0)
// (List((Day1,Data1), (Day2,Data2)),1)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3)),2)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4)),3)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4),(Day5,Data5)),4)
// (List((Day2,Data2), (Day3,Data3), (Day4,Data4), (Day5,Data5),(Day6,Data6)),5)