Apache spark Spark根据日期、月份和年份将数据流写入S3存储桶
我有一个非常类似的要求,发布在这个帖子() 但它不起作用。有人能告诉我这里缺少什么吗Apache spark Spark根据日期、月份和年份将数据流写入S3存储桶,apache-spark,amazon-s3,spark-streaming,Apache Spark,Amazon S3,Spark Streaming,我有一个非常类似的要求,发布在这个帖子() 但它不起作用。有人能告诉我这里缺少什么吗 val formatDf = df.selectExpr("CAST(value AS STRING)") .select(from_json($"value", schema).as("sInput")) // select the value and give one alias name .select("sIn
val formatDf = df.selectExpr("CAST(value AS STRING)")
.select(from_json($"value", schema).as("sInput")) // select the value and give one alias name
.select("sInput.*") // flatten the struct field
.withColumn("triggeringModels", explode($"triggeringModels")) // <-- explode the array field.
.map(row => {
.withColumn("year", functions.date_format(df.col("date"), "YYYY"))
.writeStream
.format("parquet")
.option("path","path")
.option("checkpointLocation", "checkpointpath")
.partitionBy("date")
.start()```