Apache spark Spark流式缓存和转换_Apache Spark_Spark Streaming

Apache spark Spark流式缓存和转换

apache-spark

Apache spark Spark流式缓存和转换,apache-spark,spark-streaming,Apache Spark,Spark Streaming,我是spark的新手，我正在使用Kafka的spark流媒体我的流媒体持续时间是1秒 if(resultCp!=null){ resultCp.print(); result = resultCp.union(words.mapValues(new Sum())); }else{ result = words.mapValues(new Sum());

我是spark的新手，我正在使用Kafka的spark流媒体

我的流媒体持续时间是1秒

if(resultCp!=null){
                resultCp.print();
                result = resultCp.union(words.mapValues(new Sum()));

            }else{
                result = words.mapValues(new Sum());
            }

 resultCp =  result.cache();

假设第一批有100条记录，第二批有120条记录，第三批有80条记录

--> {sec 1   1,2,...100} --> {sec 2 1,2..120} --> {sec 3 1,2,..80}

我在第一批应用我的逻辑，结果=>result1

我希望在处理第二批时使用result1，并将第二批的result1和120条记录的结果合并为=>result2

我试图缓存结果，但无法在2s中获取缓存的结果1 可能吗？或者在这里展示如何实现我的目标

 JavaPairReceiverInputDStream<String, String> messages =   KafkaUtils.createStream(jssc, String.class,String.class, StringDecoder.class,StringDecoder.class, kafkaParams, topicMap, StorageLevel.MEMORY_AND_DISK_SER_2());

当在第二批中时，ResultTCP不应为null，但它返回null值，因此在任何给定的时间，我只有特定的秒数数据，我希望找到累积结果。有人知道怎么做吗

我了解到，一旦spark streaming启动

jssc.start（）。那么，是否可以将第一批的结果发送到第二批以查找累积值
非常感谢您的帮助。提前感谢。
我想您正在寻找updateStateByKey，它通过对提供的数据流和某些状态应用累加函数来创建新的数据流。
Spark示例包中的示例涵盖了问题中的情况：
首先，您需要一个更新函数，该函数接受新值和以前已知的值：
val updateFunc = (values: Seq[Int], state: Option[Int]) => {
  val currentCount = values.sum

  val previousCount = state.getOrElse(0)

  Some(currentCount + previousCount)
}

该函数用于创建一个数据流，该数据流从源数据流中累积值。像这样：
// Create a NetworkInputDStream on target ip:port and count the
// words in input stream of \n delimited test (eg. generated by 'nc')
val lines = ssc.socketTextStream(args(0), args(1).toInt)
val words = lines.flatMap(_.split(" "))
val wordDstream = words.map(x => (x, 1))

// Update the cumulative count using updateStateByKey
// This will give a Dstream made of state (which is the cumulative count of the words)
val stateDstream = wordDstream.updateStateByKey[Int](updateFunc) 

资料来源：