Java 在Apache Spark SQL中创建了嵌套架构
我想将一个简单的JSON模式加载到我的SparkSession中,它有一个带有地址数组的employee。下面是示例JSONJava 在Apache Spark SQL中创建了嵌套架构,java,json,apache-spark,apache-spark-sql,apache-spark-dataset,Java,Json,Apache Spark,Apache Spark Sql,Apache Spark Dataset,我想将一个简单的JSON模式加载到我的SparkSession中,它有一个带有地址数组的employee。下面是示例JSON {"firstName":"Neil","lastName":"Irani", "addresses" : [ { "city" : "Brindavan", "state" : "NJ" }, { "city" : "Subala", "state" : "DT" }]} 我正在尝试创建加载JSON的模式,我相信下面创建模式的方式有问题。。。请告知。。下面的代
{"firstName":"Neil","lastName":"Irani", "addresses" : [ { "city" : "Brindavan", "state" : "NJ" }, { "city" : "Subala", "state" : "DT" }]}
我正在尝试创建加载JSON的模式,我相信下面创建模式的方式有问题。。。请告知。。下面的代码是用Java编写的。。。我找不到合理的样品
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("addresses", DataTypes.createStructType(addressFields), true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);
Dataset<Employee> rowDataset = sparkSession.read()
.option("inferSchema", "false")
.schema(employeeSchema)
.json("simple_employees.json").as(employeeEncoder);
List employeeFields=new ArrayList();
add(DataTypes.createStructField(“firstName”,DataTypes.StringType,true));
add(DataTypes.createStructField(“lastName”,DataTypes.StringType,true));
add(DataTypes.createStructField(“email”,DataTypes.StringType,true));
List addressFields=new ArrayList();
address fields.add(DataTypes.createStructField(“city”,DataTypes.StringType,true));
address fields.add(DataTypes.createStructField(“state”,DataTypes.StringType,true));
address fields.add(DataTypes.createStructField(“zip”,DataTypes.StringType,true));
add(DataTypes.createStructField(“addresses”,DataTypes.createStructType(addressFields),true));
StructType employeeSchema=DataTypes.createStructType(employeeFields);
Dataset rowDataset=sparkSession.read()
.选项(“推断模式”、“错误”)
.schema(employeeSchema)
.json(“simple_employees.json”).as(employeeEncoder);
更新
我没有创建数组类型,下面的代码可以正常工作
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
ArrayType addressStruct = DataTypes.createArrayType( DataTypes.createStructType(addressFields));
employeeFields.add(DataTypes.createStructField("addresses", addressStruct, true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);
List employeeFields=new ArrayList();
add(DataTypes.createStructField(“firstName”,DataTypes.StringType,true));
add(DataTypes.createStructField(“lastName”,DataTypes.StringType,true));
add(DataTypes.createStructField(“email”,DataTypes.StringType,true));
List addressFields=new ArrayList();
address fields.add(DataTypes.createStructField(“city”,DataTypes.StringType,true));
address fields.add(DataTypes.createStructField(“state”,DataTypes.StringType,true));
address fields.add(DataTypes.createStructField(“zip”,DataTypes.StringType,true));
ArrayType addressStruct=DataTypes.createArrayType(DataTypes.createStructType(addressFields));
add(DataTypes.createStructField(“addresses”,addressStruct,true));
StructType employeeSchema=DataTypes.createStructType(employeeFields);