解析&;使用Spark&;在文本文件中展平JSON对象;将Scala转换为数据帧

解析&;使用Spark&;在文本文件中展平JSON对象;将Scala转换为数据帧,json,scala,apache-spark,dataframe,Json,Scala,Apache Spark,Dataframe,我有一个结构如下的文本文件 (employeeID: Int, Name: String, ProjectDetails: JsonObject{[{ProjectName, Description, Duriation, Role}]}) 例如: 有人能帮我用scala解析和展平数据帧下面的记录吗 employeeID, Name, ProjectName, Description, Duration, Role 123456, Employee1, Web Develoement, Onl

我有一个结构如下的文本文件

(employeeID: Int, Name: String, ProjectDetails: JsonObject{[{ProjectName, Description, Duriation, Role}]})
例如:

有人能帮我用scala解析和展平数据帧下面的记录吗

employeeID, Name, ProjectName, Description, Duration, Role
123456, Employee1, Web Develoement, Online Sales website, 6 Months , Developer
123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months, Data Engineer
123456, Employee1, Scala Training, Training, 1 Month, null

你可以试试这个。。但是稍微修改了输入结构,因为前两列不是Json格式

scala>import org.apache.spark.SparkConf
scala>导入org.apache.spark.SparkContext
scala>import org.apache.spark.sql.SQLContext
scala>import org.apache.spark.sql_
scala>val sqlSC=neworg.apache.spark.sql.SQLContext(sc)
scala>导入sqlSC.implicits_
scala>val emp_DF=sqlSC.jsonFile(“file:///C:/Users/ABCD/Desktop/Examples/Spark/Mailing List/Employee_嵌套_Projects.json”)
scala>case类ProjectInfo(ProjectName:String,Description:String,Duration:String,Role:String)
scala>case类项目(employeeID:Int,Name:String,ProjectDetails:Seq[ProjectInfo])
scala>val emp_projects_DF=emp_DF.explode(emp_DF(“ProjectDetails”)){
案例行(x:Seq[Row])=>x.map(x=>ProjectInfo(x(0).asInstanceOf[String],x(1).asInstanceOf[String],x(2).asInstanceOf[String],x(3).asInstanceOf[String]))
scala>emp\u projects\u DF.选择($“employeeID”、$“Name”、$“ProjectName”、$“Description”、$“Duration”、$“Role”).show()
employeeID, Name, ProjectName, Description, Duration, Role
123456, Employee1, Web Develoement, Online Sales website, 6 Months , Developer
123456, Employee1, Spark Develoement, Online Sales Analysis, 6 Months, Data Engineer
123456, Employee1, Scala Training, Training, 1 Month, null