Eclipse 如何在scala spark中迭代JSON对象_Eclipse_Scala_Apache Spark

Eclipse 如何在scala spark中迭代JSON对象

eclipse scala apache-spark

Eclipse 如何在scala spark中迭代JSON对象,eclipse,scala,apache-spark,Eclipse,Scala,Apache Spark,我有一个输入json文件，它有两个对象。当我尝试读取文件时，我使用schema获得第一个对象值这是我的密码 //示例json { name: jack, age: 30, joinDate: 12-12-2018, id: 01123 } { name: bob, age: 25, joinDate: 12-01-2019, id: 02354 } object readjson { val Schema = StructType(Seq( StructField("

我有一个输入json文件，它有两个对象。当我尝试读取文件时，我使用schema获得第一个对象值

这是我的密码

//示例json

{
name: jack,
age: 30,
joinDate: 12-12-2018,
id: 01123
}
{
name: bob,
age: 25,
joinDate: 12-01-2019,
id: 02354
}


object readjson {
val Schema = StructType(Seq(
      StructField("name", StringType),
      StructField("age", StringType),
      StructField("joinDate", StringType),
      StructField("id", StringType)
    ));

    val json_file_path = "C:\\employee"

    val dataframe = spark
      .read
      .option("multiLine", true)
      .schema(Schema)
      .json(json_file_path)
      .show()
}

我得到的输出：

name age joinDate id
jack 30  12-12-2018 01123

预期产出：

name age joinDate id
jack 30  12-12-2018 01123
bob  25  12-01-2019 02354

我使用Spark 2.4.4尝试了您的代码，它工作正常，我所做的唯一更改是用双引号围绕JSON中的字符串：

[
  {
    "name": "jack",
    "age": 30,
    "joinDate": "12-12-2018",
    "id": 1123
  },
  {
    "name": "bob",
    "age": 25,
    "joinDate": "12-01-2019",
    "id": 2354
  }
]

$spark shell
Spark上下文以“sc”形式提供（master=local[*]，app id=local-1598014153867）。
Spark会话可用作“Spark”。
欢迎来到
____              __
/ __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/
/___/.版本2.4.4
/_/
使用Scala版本2.11.12（Java热点（TM）64位服务器虚拟机，Java 1.8.0_201）
键入要计算的表达式。
键入：有关详细信息的帮助。
scala>import org.apache.spark.sql.types.{StructType，StructField，StringType}
导入org.apache.spark.sql.types.{StructType，StructField，StringType}
scala>对象读取JSON{
|val Schema=StructType（Seq(
|StructField（“名称”，StringType），
|StructField（“年龄”，StringType），
|StructField（“joinDate”，StringType），
|StructField（“id”，StringType）
|     ));
| 
|val json_file_path=“”
| 
|val dataframe=spark
|.读
|.选项（“多行”，真）
|.schema（schema）
|.json（json文件路径）
|.show（）
| }
已定义对象readjson
scala>readjson
+----+---+----------+----+
|姓名|年龄|加入日期| id|
+----+---+----------+----+
|杰克| 30 | 12-12-2018 | 1123|
|鲍勃| 25 | 12-01-2019 | 2354|
+----+---+----------+----+
res0:readjson.type=readjson$@4759b196

您对object数组执行了此操作，否则，您的JSON将无效。我可以将object推入数组吗

$ spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1598014153867).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.types.{ StructType, StructField, StringType }
import org.apache.spark.sql.types.{StructType, StructField, StringType}

scala> object readjson {
     | val Schema = StructType(Seq(
     |       StructField("name", StringType),
     |       StructField("age", StringType),
     |       StructField("joinDate", StringType),
     |       StructField("id", StringType)
     |     ));
     | 
     |     val json_file_path = "<path-to-json-file>"
     | 
     |     val dataframe = spark
     |       .read
     |       .option("multiLine", true)
     |       .schema(Schema)
     |       .json(json_file_path)
     |       .show()
     | }
defined object readjson

scala> readjson
+----+---+----------+----+
|name|age|  joinDate|  id|
+----+---+----------+----+
|jack| 30|12-12-2018|1123|
| bob| 25|12-01-2019|2354|
+----+---+----------+----+

res0: readjson.type = readjson$@4759b196