Apache spark 以非字符串化格式向kafka发送json事件
我创建了一个如下所示的数据框架,其中我使用了_json()方法来创建json数组值Apache spark 以非字符串化格式向kafka发送json事件,apache-spark,spark-structured-streaming,Apache Spark,Spark Structured Streaming,我创建了一个如下所示的数据框架,其中我使用了_json()方法来创建json数组值 +---------------------------------------------------------------------------------------------------- |json_data
+----------------------------------------------------------------------------------------------------
|json_data |
+-----------------------------------------------------------------------------------------------------------+
|{"name":"sensor1","value-array":[{"time":"2020-11-27T01:01:00.000Z","sensorvalue":11.0,"tag1":"tagvalue"}]}|
+-----------------------------------------------------------------------------------------------------------+
我使用下面的方法将数据帧发送到卡夫卡主题。
但是当我使用发送到kafka主题的数据时,我可以看到json数据被字符串化了
将数据推送到卡夫卡的代码:
outgoingDF.selectExpr("CAST(Key as STRING) as key", "to_json(struct(*)) AS value")
.write
.format("kafka")
.option("topic", "topic_test")
.option("kafka.bootstrap.servers", "localhost:9093")
.option("checkpointLocation", checkpointPath)
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.security.protocol", "SASL_SSL")
.option("truncate", false)
.save()
卡夫卡接收到的字符串化数据:
{
"name": "sensor1",
"value-array": "[{\"time\":\"2020-11-27T01:01:00.000Z\",\"sensorvalue\":11.0,\"tag1\":\"tagvalue\"}]"
}
我们如何将数据发送到kafka topic,这样我们就不会看到字符串化的json作为输出?
json\u数据
的类型是string
&您再次将json\u数据
传递给
to_json(struct(“*”)
函数
选中要转到卡夫卡的value
列
df.withColumn("value",to_json(struct($"*"))).show(false)
+-----------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
|json_data |value |
+-----------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
|{"name":"sensor1","value-array":[{"time":"2020-11-27T01:01:00.000Z","sensorvalue":11.0,"tag1":"tagvalue"}]}|{"json_data":"{\"name\":\"sensor1\",\"value-array\":[{\"time\":\"2020-11-27T01:01:00.000Z\",\"sensorvalue\":11.0,\"tag1\":\"tagvalue\"}]}"}|
+-----------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
试试下面的代码
df
.withColumn("value-array",array(struct($"time",$"sensorvalue",$"tag1")))
.selectExpr("CAST(Key as STRING) as key",to_json(struct($"name",$"value-array")).as("value"))
.write
.format("kafka")
.option("topic", "topic_test")
.option("kafka.bootstrap.servers", "localhost:9093")
.option("checkpointLocation", checkpointPath)
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.security.protocol", "SASL_SSL")
.option("truncate", false)
.save()