Apache spark 使用PYSPARK从JSON数据创建数据帧
我正在尝试使用pyspark模块从json数据创建一个数据帧,但无法做到,尝试使用sqlContext.read.json创建数据帧,但没有得到正确的结果 示例json数据:Apache spark 使用PYSPARK从JSON数据创建数据帧,apache-spark,dataframe,pyspark,apache-spark-sql,Apache Spark,Dataframe,Pyspark,Apache Spark Sql,我正在尝试使用pyspark模块从json数据创建一个数据帧,但无法做到,尝试使用sqlContext.read.json创建数据帧,但没有得到正确的结果 示例json数据: { "userId":"rirani", "jobTitleName":"Developer", "firstName":"Romin", "lastName":"Irani", "preferredFullName":"Romin Irani", "employeeCode":"E1", "region":"CA", "
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani@gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani@gmail.com"
}
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"tomhanks@gmail.com"
}
预期o/p:为表格格式。有人能帮我吗。您可以使用SparkSession:
my_json = [{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani@gmail.com"
},
{ "userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2", "region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani@gmail.com"
},
{ "userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks", "employeeCode":"E3", "region":"CA", "phoneNumber":"408-2222222",
"emailAddress":"tomhanks@gmail.com"
}]
json_df = spark.read.json(my_json)
json_df.show()
你得到了什么结果?o/p:的意思是什么?