Apache spark 在spark structured streaming上显示“必须使用writeStream.start()执行具有流源的查询”时出错
在spark结构上执行spark SQL时,我遇到了一些问题。 PFA的错误 这是我的密码Apache spark 在spark structured streaming上显示“必须使用writeStream.start()执行具有流源的查询”时出错,apache-spark,apache-spark-sql,spark-structured-streaming,Apache Spark,Apache Spark Sql,Spark Structured Streaming,在spark结构上执行spark SQL时,我遇到了一些问题。 PFA的错误 这是我的密码 object sparkSqlIntegration { def main(args: Array[String]) { val spark = SparkSession .builder .appName("StructuredStreaming") .master("local[*]") .config("sp
object sparkSqlIntegration {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.appName("StructuredStreaming")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.config("spark.sql.streaming.checkpointLocation", "file:///C:/checkpoint")
.getOrCreate()
setupLogging()
val userSchema = new StructType().add("name", "string").add("age", "integer")
// Create a stream of text files dumped into the logs directory
val rawData = spark.readStream.option("sep", ",").schema(userSchema).csv("file:///C:/Users/R/Documents/spark-poc-centri/csvFolder")
// Must import spark.implicits for conversion to DataSet to work!
import spark.implicits._
rawData.createOrReplaceTempView("updates")
val sqlResult= spark.sql("select * from updates")
println("sql results here")
sqlResult.show()
println("Otheres")
val query = rawData.writeStream.outputMode("append").format("console").start()
// Keep going until we're stopped.
query.awaitTermination()
spark.stop()
}
}
在执行过程中,我得到以下错误。由于我是流媒体新手,谁能告诉我如何在spark结构化流媒体上执行spark SQL查询
2018-12-27 16:02:40 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, LAPTOP-5IHPFLOD, 6829, None)
2018-12-27 16:02:41 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6731787b{/metrics/json,null,AVAILABLE,@Spark}
sql results here
Exception in thread "main" org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
FileSource[file:///C:/Users/R/Documents/spark-poc-centri/csvFolder]
at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:374)
at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:37)
at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:35)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:392)
你不需要这些线路 导入spark.implicits_ rawData.createOrReplaceTempViewupdates val sqlResult=spark.sqlselect*from updates 在此处打印SQL结果 sqlResult.show 打印其他 最重要的是,不需要选择*。打印数据框时,您将已经看到所有列。因此,您也不需要注册临时视图来为其命名 当您格式化控制台时,就不再需要.show 用于从网络套接字读取并输出到控制台
val words = // omitted ... some Streaming DataFrame
// Generating a running word count
val wordCounts = words.groupBy("value").count()
// Start running the query that prints the running counts to the console
val query = wordCounts.writeStream
.outputMode("complete")
.format("console")
.start()
query.awaitTermination()
Take-away-使用数据帧操作,如.select和.groupBy,而不是原始SQL
或者您可以使用Spark Streaming,您需要在每个流批上foreachRDD,然后将它们转换为数据帧,您可以查询这些数据帧
/**用于将RDD转换为数据帧的Case类*/
大小写类Recordword:String
val words=//省略了。。。一些流
//将单词DStream的RDD转换为DataFrame并运行SQL查询
words.foreachRDD{rdd:rdd[String],time:time=>
//获取SparkSession的单例实例
val spark=SparkSessionSingleton.getInstancerdd.sparkContext.getConf
导入spark.implicits_
//将RDD[String]转换为RDD[case class]再转换为DataFrame
val wordsDataFrame=rdd.mapw=>Recordw.toDF
//使用DataFrame创建临时视图
wordsDataFrame.createOrReplaceTempViewwords
//使用SQL对表进行字数统计并打印
val wordCountsDataFrame=
spark.sqlselect word,按单词分组将*计算为总计
printlns===========$time=========
wordCountsDataFrame.show
}
ssc.start
ssc.1终止