Apache spark 如何在spark结构化流媒体中将流数据集转换为JavaRDD
我有一个结构化的流媒体应用程序,可以从hdfs路径读取数据Apache spark 如何在spark结构化流媒体中将流数据集转换为JavaRDD,apache-spark,dataframe,dataset,rdd,spark-structured-streaming,Apache Spark,Dataframe,Dataset,Rdd,Spark Structured Streaming,我有一个结构化的流媒体应用程序,可以从hdfs路径读取数据 structStream = spark.readStream().format("text").load(parameters.get("input")); JavaRDD<String> transformedstructStreamRDD = structStream.as(Encoders.STRING()).toJavaRDD(); Dataset<Row> df = spark.createDat
structStream = spark.readStream().format("text").load(parameters.get("input"));
JavaRDD<String> transformedstructStreamRDD = structStream.as(Encoders.STRING()).toJavaRDD();
Dataset<Row> df = spark.createDataFrame(transformedstructStreamRDD, String.class);
//Dataset<Row> df = structStream.as("dummy");
StreamingQuery streamingQuery = df.writeStream().format("csv").option("checkpointLocation","/user/hadoop/chkpointpath/").option("path", "/user/hadoop/output/").start();
try {
streamingQuery.awaitTermination();
} catch (StreamingQueryException e) {
e.printStackTrace();
}
org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;