Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset_Apache Spark - Fatal编程技术网

Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset

Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset,apache-spark,Apache Spark,我正在尝试读取文本文件并将其转换为数据帧 val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get)) .map((row) => row.toString().split(",")) .map(attributes => { Row(attributes(0), attributes(1), attributes(2), att

我正在尝试读取文本文件并将其转换为数据帧

val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
.map((row) => row.toString().split(","))
.map(attributes => {
 Row(attributes(0), attributes(1), attributes(2), attributes(3), attributes(4))
}).as[Row]
当我输入inputDf.printSchema时,我得到一个列

root
 |-- value: binary (nullable = true)
如何将此文本文件转换为多列模式Dataframe/Dataset

已解决

  val inputSchema: StructType = StructType(
  List(
    StructField("1", StringType, true),
    StructField("2", StringType, true),
    StructField("3", StringType, true),
    StructField("4", StringType, true),
    StructField("5", StringType, true)
  )
)

val encoder = RowEncoder(inputSchema)

  val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
  .map((row) => row.toString().split(","))
  .map(attributes => {

    Row(attributes(0), attributes(1), attributes(2), attributes(3), "BUY")
  })(encoder)