Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ionic-framework/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark org.apache.spark.sql.AnalysisException:';写';无法在流式数据集/数据帧上调用_Apache Spark_Spark Streaming_Spark Structured Streaming - Fatal编程技术网

Apache spark org.apache.spark.sql.AnalysisException:';写';无法在流式数据集/数据帧上调用

Apache spark org.apache.spark.sql.AnalysisException:';写';无法在流式数据集/数据帧上调用,apache-spark,spark-streaming,spark-structured-streaming,Apache Spark,Spark Streaming,Spark Structured Streaming,我正试图将Spark结构化流媒体(2.3)数据集写入Scyllab(Cassandra) 我编写数据集的代码: def saveStreamSinkProvider(ds:Dataset[InvoiceItemKafka])={ ds .writeStream .format(“cassandra.ScyllaSinkProvider”) .outputMode(outputMode.Append) .queryName(“KafkaToCassandraStreamSinkProvider”)

我正试图将Spark结构化流媒体(2.3)数据集写入Scyllab(Cassandra)

我编写数据集的代码:

def saveStreamSinkProvider(ds:Dataset[InvoiceItemKafka])={
ds
.writeStream
.format(“cassandra.ScyllaSinkProvider”)
.outputMode(outputMode.Append)
.queryName(“KafkaToCassandraStreamSinkProvider”)
.选项(
地图(
“键空间”->命名空间,
“表”->StreamProviderTableLink,
“检查点位置”->“/tmp/检查点”
)
)
.start()
}
我的“锡拉”沉没了:

class ScyllaSinkProvider扩展了StreamSinkProvider{
重写def createSink(sqlContext:sqlContext,
参数:Map[String,String],
分区列:Seq[String],
outputMode:outputMode):锡拉链=
新锡拉链(参数)
}
类ScyllaSink(参数:Map[String,String])扩展接收器{
覆盖def ADDBACH(batchId:Long,data:DataFrame):单位=
数据写入
.Cassandra格式(
参数(“表格”),
参数(“键空间”)
//参数(“集群”)
)
.mode(SaveMode.Append)
.save()
}
但是,当我运行此代码时,我收到一个异常:

。。。
[错误]+-streamingExecutionRelationKafkaSource[订阅[事务加载],[键7,值8,主题9,分区10,偏移量11L,时间戳12,时间类型13]
[错误]位于org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
[错误]位于org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
[错误]原因:org.apache.spark.sql.AnalysisException:无法在流式数据集/数据帧上调用“write”;
[错误]位于org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
[错误]位于org.apache.spark.sql.Dataset.write(Dataset.scala:3103)
[错误]位于cassandra.ScyllaSink.addBatch(CassandraDriver.scala:113)
[错误]位于org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$3$$anonfun$apply$16.apply(MicroBatchExecution.scala:477)
...


我已经看到了一个类似的问题,但是对于CosmosDB-

您可以先将其转换为RDD,然后写入:

class ScyllaSink(parameters: Map[String, String]) extends Sink {    

  override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    val schema = data.schema
    // this ensures that the same query plan will be used
    val rdd: RDD[Row] = df.queryExecution.toRdd.mapPartitions { rows =>
      val converter = CatalystTypeConverters.createToScalaConverter(schema)
      rows.map(converter(_).asInstanceOf[Row])
    }

    // write the RDD to Cassandra 
  }
}

我想你不能把批量和流媒体混为一谈。你最终可能会为锡拉布(卡桑德拉)创建“流式”接收器。