Java 我正在使用ApacheSpark解析json文件。如何从json文件中获取嵌套密钥,无论它是什么';s数组或嵌套键
我有多个json文件,它们保持json数据初始化。Json结构如下所示Java 我正在使用ApacheSpark解析json文件。如何从json文件中获取嵌套密钥,无论它是什么';s数组或嵌套键,java,json,apache-spark,apache-spark-sql,Java,Json,Apache Spark,Apache Spark Sql,我有多个json文件,它们保持json数据初始化。Json结构如下所示 { "Name":"Vipin Suman", "Email":"vpn2330@gmail.com", "Designation":"Trainee Programmer", "Age":22 , "location": {"City": { "Pin":324009, "City Name":"Ahmedabad"
{
"Name":"Vipin Suman",
"Email":"vpn2330@gmail.com",
"Designation":"Trainee Programmer",
"Age":22 ,
"location":
{"City":
{
"Pin":324009,
"City Name":"Ahmedabad"
},
"State":"Gujarat"
},
"Company":
{
"Company Name":"Elegant",
"Domain":"Java"
},
"Test":["Test1","Test2"]
}
我试过这个
String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-03.json";
String[] jsonFiles = jsonFilePath.split(",");
Dataset<Row> people = sparkSession.read().json(jsonFiles);
我得到了表的视图:-
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
|Age| Company| Designation| Email| Name| Test| location|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
| 22|[Elegant,Java]|Trainee Programmer|vpn2330@gmail.com|Vipin Suman|[Test1, Test2]|[[Ahmedabad,32400...|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
我希望结果如下:-
Age | Company Name | Domain| Designation | Email | Name | Test | City Name | Pin | State |
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test1 | Ahmedabad | 324009 | Gujarat
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test2 | Ahmedabad | 324009 |
我怎样才能把桌子放在上面。我什么都试过了。我是apache spark的新手,有人能帮我吗?我建议你在scala中工作,scala更受spark的支持。为了完成您的工作,您可以使用“select”API来选择特定列,使用alias来重命名列,并且您可以参考这里来说明如何选择复杂数据格式()
根据您的结果,您还需要在Scala中使用“explode”API(),具体操作如下:
people.select(
$“年龄”,
$“公司*”,
$“指定”,
$“电子邮件”,
$“名称”,
爆炸($“测试”),
$“location.City.*”,
$“location.State”)
不幸的是,以下Java代码将失败:
people.select(
people.col(“年龄”),
people.col(“公司*”),
人民学院(“指定”),
people.col(“电子邮件”),
people.col(“姓名”),
爆炸(人柱(“测试”),
people.col(“location.City.*),
people.col(“location.State”);
您可以使用selectExpr
,但:
people.selectExpr(
“年龄”,
“公司*”,
“指定”,
“电子邮件”,
“姓名”,
“爆炸(测试)作为测试”,
“地点。城市。*”,
“地点、国家”);
PS:
您可以在sparkSession.read().JSON(jsonFiles)中传递一个或多个目录的路径,而不是JSON文件列表代码>
Age | Company Name | Domain| Designation | Email | Name | Test | City Name | Pin | State |
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test1 | Ahmedabad | 324009 | Gujarat
22 | Elegant MicroWeb | Java | Programmer | vpn2330@gmail.com | Vipin Suman | Test2 | Ahmedabad | 324009 |