Scala Apache Spark在一次运行中读取多个文本文件
我可以使用以下Apache Spark Scala代码成功地将文本文件加载到数据帧中:Scala Apache Spark在一次运行中读取多个文本文件,scala,apache-spark,apache-spark-sql,text-files,Scala,Apache Spark,Apache Spark Sql,Text Files,我可以使用以下Apache Spark Scala代码成功地将文本文件加载到数据帧中: val df = spark.read.text("first.txt") .withColumn("fileName", input_file_name()) .withColumn("unique_id", monotonically_increasing_id()) 有没有办法在一次运行中提供多个文件?大概是这样的: val df = spark.read.text("first.txt,se
val df = spark.read.text("first.txt")
.withColumn("fileName", input_file_name())
.withColumn("unique_id", monotonically_increasing_id())
有没有办法在一次运行中提供多个文件?大概是这样的:
val df = spark.read.text("first.txt,second.txt,someother.txt")
.withColumn("fileName", input_file_name())
.withColumn("unique_id", monotonically_increasing_id())
现在,以下代码不适用于以下错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:first.txt,second.txt,someother.txt;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
如何正确加载多个文本文件?函数
spark.read.text()
有一个varargs参数,来自:
def文本(路径:字符串*):数据帧
这意味着要读取多个文件,只需将它们提供给以逗号分隔的函数,即
val df = spark.read.text("first.txt", "second.txt", "someother.txt")
函数spark.read.text()
有一个varargs参数,来自:
def文本(路径:字符串*):数据帧
这意味着要读取多个文件,只需将它们提供给以逗号分隔的函数,即
val df = spark.read.text("first.txt", "second.txt", "someother.txt")