Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark reduceByKey不';不要在火花流中工作_Apache Spark_Apache Kafka_Spark Streaming - Fatal编程技术网

Apache spark reduceByKey不';不要在火花流中工作

Apache spark reduceByKey不';不要在火花流中工作,apache-spark,apache-kafka,spark-streaming,Apache Spark,Apache Kafka,Spark Streaming,我有以下代码片段,其中reduceByKey似乎不起作用 val myKafkaMessageStream = KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topicsSet, kafkaParams) ) myKafkaMessageStream .foreachRDD { rdd => val offsetRan

我有以下代码片段,其中reduceByKey似乎不起作用

val myKafkaMessageStream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topicsSet, kafkaParams)
)

myKafkaMessageStream
  .foreachRDD { rdd => 
    val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
    val myIter = rdd.mapPartitionsWithIndex { (i, iter) =>
      val offset = offsetRanges(i)
      iter.map(item => {
        (offset.fromOffset, offset.untilOffset, offset.topic, offset.partition, item)
      })
    }

    val myRDD = myIter.filter( (<filter_condition>) ).map(row => {
      //Process row

      ((field1, field2, field3) , (field4, field5))
    })

    val result = myRDD.reduceByKey((a,b) => (a._1+b._1, a._2+b._2))

    result.foreachPartition { partitionOfRecords =>
      //I don't get the reduced result here
      val connection = createNewConnection()
      partitionOfRecords.foreach(record => connection.send(record))
      connection.close()
    }        
  }
val myKafkaMessageStream=KafkaUtils.createDirectStream[字符串,字符串](
ssc,
他说,,
订阅[字符串,字符串](主题集,卡夫卡帕拉姆)
)
myKafkaMessageStream
.foreachRDD{rdd=>
val offsetRanges=rdd.asInstanceOf[HASSOFFSETRANGES].offsetRanges
val myIter=rdd.mapPartitionsWithIndex{(i,iter)=>
val偏移=偏移范围(i)
iter.map(项目=>{
(offset.fromOffset,offset.untloffset,offset.topic,offset.partition,item)
})
}
val myRDD=myIter.filter(()).map(行=>{
//进程行
((字段1、字段2、字段3)、(字段4、字段5))
})
val结果=myRDD.reduceByKey((a,b)=>(a._1+b._1,a._2+b._2))
result.foreachPartition{partitionOfRecords=>
//我没有得到简化的结果
val connection=createNewConnection()
Records.foreach分区(记录=>connection.send(记录))
连接。关闭()
}        
}

我遗漏了什么吗?

在流式处理的情况下,使用
ReduceByAndWindow对我来说更有意义,它可以满足您的需要,但要在特定的时间范围内完成

// Reduce last 30 seconds of data, every 10 seconds
val windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10))
在(K,V)对的数据流上调用时,返回一个(K,V)对的新数据流,其中每个键的值在滑动窗口中使用给定的reduce函数func在批上聚合。注意:默认情况下,这使用Spark的默认并行任务数(2对于本地模式,在群集模式下,数字由配置属性spark.default.parallelism确定)进行分组。您可以传递可选的numTasks参数来设置不同数量的任务。”


这里没有太多细节。你能将其简化为核心示例(使用常量流或队列流)吗?这意味着什么:似乎不起作用?引发异常?不分组记录吗?如果是前者,你确定可以通过这种方式比较记录(提供有用的哈希/相等)?为什么您希望fromOffset、UntiloOffset和topic创建一个适当的键来减少?@LostInOverflow很抱歉不够清晰。当我执行myRDD.foreach(println)时,我看到了内容。但是当我执行result.foreach(println)时我看不到内容…所以没有错误…但我得到的是空的results@YuvalItzchakov所以,我试图在处理记录之后将fromOffset和untlOffset存储在数据库中,这样我就知道了最新处理的偏移量。