Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset
我正在尝试读取文本文件并将其转换为数据帧Apache spark Spark-如何将文本文件转换为多列模式DataFrame/Dataset,apache-spark,Apache Spark,我正在尝试读取文本文件并将其转换为数据帧 val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get)) .map((row) => row.toString().split(",")) .map(attributes => { Row(attributes(0), attributes(1), attributes(2), att
val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
.map((row) => row.toString().split(","))
.map(attributes => {
Row(attributes(0), attributes(1), attributes(2), attributes(3), attributes(4))
}).as[Row]
当我输入inputDf.printSchema时,我得到一个列
root
|-- value: binary (nullable = true)
如何将此文本文件转换为多列模式Dataframe/Dataset已解决
val inputSchema: StructType = StructType(
List(
StructField("1", StringType, true),
StructField("2", StringType, true),
StructField("3", StringType, true),
StructField("4", StringType, true),
StructField("5", StringType, true)
)
)
val encoder = RowEncoder(inputSchema)
val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
.map((row) => row.toString().split(","))
.map(attributes => {
Row(attributes(0), attributes(1), attributes(2), attributes(3), "BUY")
})(encoder)