Scala 在WindowedStream中查找前K个元素-Flink_Scala_Stream_Streaming_Apache Flink_Flink Streaming

Scala 在WindowedStream中查找前K个元素-Flink

scala stream streaming apache-flink

Scala 在WindowedStream中查找前K个元素-Flink,scala,stream,streaming,apache-flink,flink-streaming,Scala,Stream,Streaming,Apache Flink,Flink Streaming,我对溪流的世界很陌生，在我的第一次尝试中遇到了一些问题我想做的是在下面的窗口：WindowdStream中找到前K个元素。我试图实现自己的功能，但不确定它实际上是如何工作的好像什么都没印出来你有什么提示吗 val parsedStream: DataStream[(String, Response)] = stream .mapWith(_.decodeOption[Response]) .filter(_.isDefined) .map { reco

我对溪流的世界很陌生，在我的第一次尝试中遇到了一些问题

我想做的是在下面的

窗口：WindowdStream

中找到前K个元素。我试图实现自己的功能，但不确定它实际上是如何工作的

好像什么都没印出来

你有什么提示吗

val parsedStream: DataStream[(String, Response)] = stream
      .mapWith(_.decodeOption[Response])
      .filter(_.isDefined)
      .map { record =>
        (
          s"${record.get.group.group_country}, ${record.get.group.group_city}",
          record.get
        )
      }

val topLocations = parsedStream
      .keyBy(_._1)
      .timeWindow(Time.days(7))
      .process(new SortByCountFunction)

排序计数函数

class SortByCountFunction
    extends ProcessWindowFunction[(String, Response), MeetUpLocationWindow, String, TimeWindow] {

    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String, Response)],
                         out: Collector[MeetUpLocationWindow]): Unit = {

      val count: Map[String, Iterable[(String, Response)]] = elements.groupBy(_._1)

      val locAndCount: Seq[MeetUpLocation] = count.toList.map(tmp => {
        val location: String = tmp._1
        val meetUpList: Iterable[(String, Response)] = tmp._2
        MeetUpLocation(location, tmp._2.size, meetUpList.map(_._2).toList)
      })

      val output: List[MeetUpLocation] = locAndCount.sortBy(tup => tup.count).take(20).toList

      val windowEnd = context.window.getEnd

      out.collect(MeetUpLocationWindow(windowEnd, output))
    }
  }

case class MeetUpLocationWindow(endTs: Long, locations: List[MeetUpLocation])

case class MeetUpLocation(location: String, count: Int, meetUps: List[Response])

当您的Flink DataStream作业无法生成任何输出时，通常的怀疑是：

作业不会在StreamExecutionEnvironment上调用execute（）（例如，
```
env.execute（）
```
）
作业未连接接收器（例如，
```
TopLocations.print（）
```
）
作业旨在使用事件时间，但水印设置不正确，或者空闲源阻止水印前进
作业正在写入taskmanager日志，但没有人注意到
输出类型的序列化程序不生成输出

如果没有更多的信息，很难猜测这其中哪一个可能是本例中的问题。

我认为问题出在

SortByCountFunction

中，但无法检测到它。在我看来，SortByCountFunction总是会将某些内容收集到输出中，即使结果是错误的。如果作业完全没有打印任何内容，那么肯定还有其他问题。