Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset_Apache Spark

Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset

apache-spark

Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset,apache-spark,Apache Spark,我正在尝试读取文本文件并将其转换为数据帧 val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get)) .map((row) => row.toString().split(",")) .map(attributes => { Row(attributes(0), attributes(1), attributes(2), att

我正在尝试读取文本文件并将其转换为数据帧

val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
.map((row) => row.toString().split(","))
.map(attributes => {
 Row(attributes(0), attributes(1), attributes(2), attributes(3), attributes(4))
}).as[Row]

当我输入inputDf.printSchema时，我得到一个列

root
 |-- value: binary (nullable = true)

如何将此文本文件转换为多列模式Dataframe/Dataset

已解决

  val inputSchema: StructType = StructType(
  List(
    StructField("1", StringType, true),
    StructField("2", StringType, true),
    StructField("3", StringType, true),
    StructField("4", StringType, true),
    StructField("5", StringType, true)
  )
)

val encoder = RowEncoder(inputSchema)

  val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
  .map((row) => row.toString().split(","))
  .map(attributes => {

    Row(attributes(0), attributes(1), attributes(2), attributes(3), "BUY")
  })(encoder)