Apache spark 如何将数据流转换为avro格式并在hdfs中保存文件_Apache Spark_Spark Streaming_Avro

Apache spark 如何将数据流转换为avro格式并在hdfs中保存文件

apache-spark

Apache spark 如何将数据流转换为avro格式并在hdfs中保存文件,apache-spark,spark-streaming,avro,Apache Spark,Spark Streaming,Avro,我有一个类型为[String，ArrayList[String]]的数据流，我想把这个数据流转换成avro格式并保存到hdfs。如何实现这一点？您可以将流转换为JavaRDD或将其转换为DataFrame并将其写入文件并提供Avro格式 // Apply a schema to an RDD DataFrame booksDF = sqlContext.createDataFrame(books, Books.class); booksDF.write() .format("com.da

我有一个类型为[String，ArrayList[String]]的数据流，我想把这个数据流转换成avro格式并保存到hdfs。如何实现这一点？

您可以将流转换为JavaRDD或将其转换为DataFrame并将其写入文件并提供Avro格式

// Apply a schema to an RDD
DataFrame booksDF = sqlContext.createDataFrame(books, Books.class);
booksDF.write()
    .format("com.databricks.spark.avro")
    .save("/output");

请访问以获取更多示例

希望这能有所帮助。

DataFrame booksDF=sqlContext.createDataFrame（books，books.class）；这里显示的是nullpointerException，在我提供的dstream.getClass（）类的字段中，现在我已经转换了我的dstream og类型[GenericData.Record]，所以如果您可以在该上下文中回答，这将非常有用