为列数组制作scala测试用例
我想为上面的模式创建一个行序列,用于生成测试用例,并希望得到相同的建议。 我试着这样做为列数组制作scala测试用例,scala,apache-spark,Scala,Apache Spark,我想为上面的模式创建一个行序列,用于生成测试用例,并希望得到相同的建议。 我试着这样做 |-- column1 integer (nullable = true) |-- column2: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- column21: string (nullable = true) | | |-- column22: string
|-- column1 integer (nullable = true)
|-- column2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- column21: string (nullable = true)
| | |-- column22: string (nullable = true)
| | |-- column23: integer (nullable = true)
输出:
import org.apache.spark.sql.types.{ArrayType, IntegerType, StringType, StructType}
import spark.implicits._
val df = Seq(
(1, Seq(("a", "b", 1))),
(2, Seq(("c", "d", 2)))
).toDF()
val schema = new StructType()
.add("column1", IntegerType)
.add("column2", ArrayType(new StructType()
.add("column2_1", StringType)
.add("column2_2", StringType)
.add("column2_3", IntegerType)
)
)
val df2 = spark.createDataFrame(df.rdd, schema)
df2.printSchema()
df2.show()
你能回答这个问题吗?因为我是scala的新手,请帮忙
basicaly the above implementation is wrong.
import org.apache.spark.sql.types.{ArrayType, IntegerType, StringType, StructType}
import spark.implicits._
val df = Seq(
(1, Seq(("a", "b", 1))),
(2, Seq(("c", "d", 2)))
).toDF()
val schema = new StructType()
.add("column1", IntegerType)
.add("column2", ArrayType(new StructType()
.add("column2_1", StringType)
.add("column2_2", StringType)
.add("column2_3", IntegerType)
)
)
val df2 = spark.createDataFrame(df.rdd, schema)
df2.printSchema()
df2.show()
root
|-- column1: integer (nullable = true)
|-- column2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- column2_1: string (nullable = true)
| | |-- column2_2: string (nullable = true)
| | |-- column2_3: integer (nullable = true)
+-------+-----------+
|column1| column2|
+-------+-----------+
| 1|[[a, b, 1]]|
| 2|[[c, d, 2]]|
+-------+-----------+