Eclipse 如何在scala spark中迭代JSON对象
我有一个输入json文件,它有两个对象。当我尝试读取文件时,我使用schema获得第一个对象值 这是我的密码 //示例jsonEclipse 如何在scala spark中迭代JSON对象,eclipse,scala,apache-spark,Eclipse,Scala,Apache Spark,我有一个输入json文件,它有两个对象。当我尝试读取文件时,我使用schema获得第一个对象值 这是我的密码 //示例json { name: jack, age: 30, joinDate: 12-12-2018, id: 01123 } { name: bob, age: 25, joinDate: 12-01-2019, id: 02354 } object readjson { val Schema = StructType(Seq( StructField("
{
name: jack,
age: 30,
joinDate: 12-12-2018,
id: 01123
}
{
name: bob,
age: 25,
joinDate: 12-01-2019,
id: 02354
}
object readjson {
val Schema = StructType(Seq(
StructField("name", StringType),
StructField("age", StringType),
StructField("joinDate", StringType),
StructField("id", StringType)
));
val json_file_path = "C:\\employee"
val dataframe = spark
.read
.option("multiLine", true)
.schema(Schema)
.json(json_file_path)
.show()
}
我得到的输出:
name age joinDate id
jack 30 12-12-2018 01123
预期产出:
name age joinDate id
jack 30 12-12-2018 01123
bob 25 12-01-2019 02354
我使用Spark 2.4.4尝试了您的代码,它工作正常,我所做的唯一更改是用双引号围绕JSON中的字符串:
[
{
"name": "jack",
"age": 30,
"joinDate": "12-12-2018",
"id": 1123
},
{
"name": "bob",
"age": 25,
"joinDate": "12-01-2019",
"id": 2354
}
]
$spark shell
Spark上下文以“sc”形式提供(master=local[*],app id=local-1598014153867)。
Spark会话可用作“Spark”。
欢迎来到
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/.版本2.4.4
/_/
使用Scala版本2.11.12(Java热点(TM)64位服务器虚拟机,Java 1.8.0_201)
键入要计算的表达式。
键入:有关详细信息的帮助。
scala>import org.apache.spark.sql.types.{StructType,StructField,StringType}
导入org.apache.spark.sql.types.{StructType,StructField,StringType}
scala>对象读取JSON{
|val Schema=StructType(Seq(
|StructField(“名称”,StringType),
|StructField(“年龄”,StringType),
|StructField(“joinDate”,StringType),
|StructField(“id”,StringType)
| ));
|
|val json_file_path=“”
|
|val dataframe=spark
|.读
|.选项(“多行”,真)
|.schema(schema)
|.json(json文件路径)
|.show()
| }
已定义对象readjson
scala>readjson
+----+---+----------+----+
|姓名|年龄|加入日期| id|
+----+---+----------+----+
|杰克| 30 | 12-12-2018 | 1123|
|鲍勃| 25 | 12-01-2019 | 2354|
+----+---+----------+----+
res0:readjson.type=readjson$@4759b196
您对object数组执行了此操作,否则,您的JSON将无效。我可以将object推入数组吗
$ spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1598014153867).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.sql.types.{ StructType, StructField, StringType }
import org.apache.spark.sql.types.{StructType, StructField, StringType}
scala> object readjson {
| val Schema = StructType(Seq(
| StructField("name", StringType),
| StructField("age", StringType),
| StructField("joinDate", StringType),
| StructField("id", StringType)
| ));
|
| val json_file_path = "<path-to-json-file>"
|
| val dataframe = spark
| .read
| .option("multiLine", true)
| .schema(Schema)
| .json(json_file_path)
| .show()
| }
defined object readjson
scala> readjson
+----+---+----------+----+
|name|age| joinDate| id|
+----+---+----------+----+
|jack| 30|12-12-2018|1123|
| bob| 25|12-01-2019|2354|
+----+---+----------+----+
res0: readjson.type = readjson$@4759b196