Apache spark Spark结构化流替换列的值
我有以下数据帧Apache spark Spark结构化流替换列的值,apache-spark,spark-structured-streaming,Apache Spark,Spark Structured Streaming,我有以下数据帧 val tDataJsonDF = kafkaStreamingDFParquet .filter($"value".contains("tUse")) .filter($"value".isNotNull) .selectExpr("cast (value as string) as tdatajson", "cast (topic as string) as env") .select(from_json($"tdatajson", schema =
val tDataJsonDF = kafkaStreamingDFParquet
.filter($"value".contains("tUse"))
.filter($"value".isNotNull)
.selectExpr("cast (value as string) as tdatajson", "cast (topic as string) as env")
.select(from_json($"tdatajson", schema = ParquetSchema.tSchema).as("data"), $"env".as("env"))
.select("data.*", "env")
.select($"date", <--YYYY/MM/dd
$"time",
$"event",
$"serviceGroupId",
$"userId",
$"env")
我想你用的是Spark 2.2+
tDataJsonDF.withColumn("formatted_date",date_format(to_date(col("date"), "YYYY/MM/dd"), "yyyy-MM-dd"))
tDataJsonDF.withColumn("formatted_date",date_format(to_date(col("date"), "YYYY/MM/dd"), "yyyy-MM-dd"))