Apache spark “获取异常”;未注册任何输出操作,因此无需执行任何操作;从火花流

Apache spark “获取异常”;未注册任何输出操作,因此无需执行任何操作;从火花流,apache-spark,spark-streaming,rdd,spark-structured-streaming,Apache Spark,Spark Streaming,Rdd,Spark Structured Streaming,我已经运行了它,然后它显示了一个异常 package com.scala.sparkStreaming import org.apache.spark._ import org.apache.spark.streaming._ object Demo1 { def main(assdf:Array[String]){ val sc=new SparkContext("local","Stream") val stream=new StreamingContext(

我已经运行了它,然后它显示了一个异常

package com.scala.sparkStreaming

import org.apache.spark._
import org.apache.spark.streaming._

object Demo1 {
  def main(assdf:Array[String]){

     val sc=new SparkContext("local","Stream")

     val stream=new StreamingContext(sc,Seconds(2))

     val rdd1=stream.textFileStream("D:/My Documents/Desktop/inbound/sse/ssd/").cache()

     val mp1= rdd1.flatMap(_.split(","))
     print(mp1.count())

     stream.start()
     stream.awaitTermination()
  }
}
错误消息“未注册任何输出操作,因此无需执行任何操作”提示缺少某些内容

您的直接流
rdd1
mp1
没有任何操作。一个
flatMap
只是一个由Spark延迟评估的转换。这就是
stream.start()
方法引发此异常的原因

根据文档,您可以如下所示。在处理数据流时,可以通过RDD进行迭代。下面的代码在Spark版本2.4.5下运行良好

textFileStream
的文档中说,它“监视与Hadoop兼容的文件系统中的新文件,并将其作为文本文件读取”,因此请确保在作业运行时添加/修改要读取的文件

另外,虽然我对Windows上的Spark不太熟悉,但您可能需要将目录字符串更改为

org.apache.spark.streaming.dstream.MappedDStream@6342993220/05/22 18:14:16 ERROR StreamingContext: Error starting the context, marking it as stopped
java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
    at scala.Predef$.require(Predef.scala:277)
    at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:169)
    at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:517)
    at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:577)
    at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:576)
    at com.scala.sparkStreaming.Demo1$.main(Demo1.scala:18)
    at com.scala.sparkStreaming.Demo1.main(Demo1.scala)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
    at scala.Predef$.require(Predef.scala:277)
    at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:169)
    at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:517)
    at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:577)
    at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:576)
    at com.scala.sparkStreaming.Demo1$.main(Demo1.scala:18)
    at com.scala.sparkStreaming.Demo1.main(Demo1.scala)

以下是Spark Streaming的完整代码示例:

file://D:\\My Documents\\Desktop\\inbound\\sse\\ssd
在Spark版本2.4.5中,不推荐使用
Spark流媒体
,我建议您已经熟悉
Spark结构化流媒体
。其代码如下所示:

import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Seconds, StreamingContext}

object Main extends App {
  val sc=new SparkContext("local[1]","Stream")

  val stream=new StreamingContext(sc,Seconds(2))

  val rdd1 =stream.textFileStream("file:///path/to/src/main/resources")

  val mp1= rdd1.flatMap(_.split(" "))

  mp1.foreachRDD(rdd => rdd.collect().foreach(println(_)))

  stream.start()
  stream.awaitTermination()
}
错误消息“未注册任何输出操作,因此无需执行任何操作”提示缺少某些内容

您的直接流
rdd1
mp1
没有任何操作。一个
flatMap
只是一个由Spark延迟评估的转换。这就是
stream.start()
方法引发此异常的原因

根据文档,您可以如下所示。在处理数据流时,可以通过RDD进行迭代。下面的代码在Spark版本2.4.5下运行良好

textFileStream
的文档中说,它“监视与Hadoop兼容的文件系统中的新文件,并将其作为文本文件读取”,因此请确保在作业运行时添加/修改要读取的文件

另外,虽然我对Windows上的Spark不太熟悉,但您可能需要将目录字符串更改为

org.apache.spark.streaming.dstream.MappedDStream@6342993220/05/22 18:14:16 ERROR StreamingContext: Error starting the context, marking it as stopped
java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
    at scala.Predef$.require(Predef.scala:277)
    at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:169)
    at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:517)
    at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:577)
    at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:576)
    at com.scala.sparkStreaming.Demo1$.main(Demo1.scala:18)
    at com.scala.sparkStreaming.Demo1.main(Demo1.scala)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
    at scala.Predef$.require(Predef.scala:277)
    at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:169)
    at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:517)
    at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:577)
    at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:576)
    at com.scala.sparkStreaming.Demo1$.main(Demo1.scala:18)
    at com.scala.sparkStreaming.Demo1.main(Demo1.scala)

以下是Spark Streaming的完整代码示例:

file://D:\\My Documents\\Desktop\\inbound\\sse\\ssd
在Spark版本2.4.5中,不推荐使用
Spark流媒体
,我建议您已经熟悉
Spark结构化流媒体
。其代码如下所示:

import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Seconds, StreamingContext}

object Main extends App {
  val sc=new SparkContext("local[1]","Stream")

  val stream=new StreamingContext(sc,Seconds(2))

  val rdd1 =stream.textFileStream("file:///path/to/src/main/resources")

  val mp1= rdd1.flatMap(_.split(" "))

  mp1.foreachRDD(rdd => rdd.collect().foreach(println(_)))

  stream.start()
  stream.awaitTermination()
}

我试过上线。它成功了,但在控制台中没有打印任何内容。谢谢@mike的努力。我会检查的。我试过上线。它成功了,但在控制台中没有打印任何内容。谢谢@mike的努力。我会检查的。