Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark流式空RDD问题_Scala_Apache Spark_Spark Streaming - Fatal编程技术网

Scala Spark流式空RDD问题

Scala Spark流式空RDD问题,scala,apache-spark,spark-streaming,Scala,Apache Spark,Spark Streaming,我正在尝试从RDBMS创建自定义流接收器 val dataDStream = ssc.receiverStream(new inputReceiver ()) dataDStream.foreachRDD((rdd:RDD[String],time:Time)=> { val newdata=rdd.flatMap(x=>x.split(",")) newdata.foreach(println) // *******This line has problem,

我正在尝试从RDBMS创建自定义流接收器

val dataDStream = ssc.receiverStream(new inputReceiver ())
  dataDStream.foreachRDD((rdd:RDD[String],time:Time)=> {
    val newdata=rdd.flatMap(x=>x.split(","))
    newdata.foreach(println)  // *******This line has problem, newdata has no records
  })

ssc.start()
ssc.awaitTermination()
}

class inputReceiver extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {
  def onStart() {
    // Start the thread that receives data over a connection
    new Thread("RDBMS data Receiver") {
      override def run() {
        receive()
      }
    }.start()
  }
  def onStop() {
  }

  def receive() {
    val sqlcontext = SQLContextSingleton.getInstance()

    // **** I am assuming something wrong in following code
    val DF = sqlcontext.read.json("/home/cloudera/data/s.json")
    for (data <- rdd) {
      store(data.toString())
    }
    logInfo("Stopped receiving")
    restart("Trying to connect again")
  }
}
val dataDStream=ssc.receiverStream(新的inputReceiver())
dataDStream.foreachRDD((rdd:rdd[String],time:time)=>{
val newdata=rdd.flatMap(x=>x.split(“,”))
newdata.foreach(println)/****此行有问题,newdata没有记录
})
ssc.start()
ssc.终止协议()
}
类inputReceiver通过日志记录扩展Receiver[String](StorageLevel.MEMORY_和_DISK_2){
def onStart(){
//启动通过连接接收数据的线程
新线程(“RDBMS数据接收器”){
覆盖def运行(){
接收()
}
}.start()
}
def onStop(){
}
def接收(){
val sqlcontext=SQLContextSingleton.getInstance()
//****我假设以下代码中有错误
val DF=sqlcontext.read.json(“/home/cloudera/data/s.json”)

对于(数据要使代码正常工作,应更改以下内容:

def receive() {
  val sqlcontext = SQLContextSingleton.getInstance()
  val DF = sqlcontext.read.json("/home/cloudera/data/s.json")

  // **** this:
  rdd.collect.foreach(data => store(data.toString()))

  logInfo("Stopped receiving")
  restart("Trying to connect again")
}
但是这是不可取的,因为json文件中的所有数据都将由驱动程序处理,并且接收器没有适当考虑可靠性


我怀疑Spark Streaming不适合您的用例。从字里行间看,似乎您正在流式处理,因此需要一个合适的生产者,或者您正在将数据从RDBMS转储到json中,在这种情况下,您不需要Spark Streaming。

(数据我试图在我的代码中打印数据帧
dataDStream.foreachRDD((rdd:rdd[String],time:time)=>{val newdata=rdd.flatMap(x=>x.split(“,”)newdata.foreach(println)