Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用ApacheSpark统一不同的JSON_Apache Spark_Apache Spark Sql_Databricks - Fatal编程技术网

Apache spark 使用ApacheSpark统一不同的JSON

Apache spark 使用ApacheSpark统一不同的JSON,apache-spark,apache-spark-sql,databricks,Apache Spark,Apache Spark Sql,Databricks,示例: 下面是一个json数据示例,我们可以在其中看到具有不同属性的json: {"id": 1, "label": "tube", "length": "50m", "diameter": "5cm"} {"id": 2, "label": "brick", "width": "10cm", "length": "25cm"} {"id": 3, "label": "sand", "weight": "25kg"} 询问: 是否可以在apache spark中的结构化数据集中转换此json,

示例:

下面是一个json数据示例,我们可以在其中看到具有不同属性的json:

{"id": 1, "label": "tube", "length": "50m", "diameter": "5cm"}
{"id": 2, "label": "brick", "width": "10cm", "length": "25cm"}
{"id": 3, "label": "sand", "weight": "25kg"}
询问:

是否可以在apache spark中的结构化数据集中转换此json,如下所示:

+--+-----+------+--------+-----+-------+
|id|label|length|diameter|width|weight |
+--+-----+-----------------------------+
|1 |tube |50m   |5cm     |     |       |
|2 |brick|25cm  |        |10cm |       |
|3 |sand |      |        |     |25kg   |
+--+-----+------+--------+-----+-------+

没问题。只要阅读它,让Spark推断出模式:

val ds=Seq(
“{”id:1,“标签”:“管”,“长度”:“50m”,“直径”:“5cm”}”“,
“{”id:2,“标签”:“砖”,“宽度”:“10cm”,“长度”:“25cm”}”“,
“{”id:3,“标签”:“沙子”,“重量”:“25kg”}”
)托兹先生
spark.read.json(ds.show)
// +--------+---+-----+------+------+-----+
//|直径| id |标签|长度|重量|宽度|
// +--------+---+-----+------+------+-----+
//| 5cm | 1 |管| 50m |零位|零位|
//|空| 2 |砖| 25厘米|空| 10厘米|
//|零| 3 |沙|零| 25kg |零|
// +--------+---+-----+------+------+-----+
或在读取时提供预期的模式:

import org.apache.spark.sql.types_
val字段=序号(“标签”、“长度”、“重量”、“宽度”)
val schema=StructType(
StructField(“id”,LongType)+:fields.map{
StructField(ux,StringType)
}
)
spark.read.schema(schema).json(ds.show)
// +---+-----+------+------+-----+
//| id |标签|长度|重量|宽度|
// +---+-----+------+------+-----+
//| 1 |管| 50m |空|空|
//| 2 |砖| 25厘米|零| 10厘米|
//| 3 |砂|零| 25kg |零|
// +---+-----+------+------+-----+

您使用的Spark版本是什么?