Scala Apache Spark在一次运行中读取多个文本文件_Scala_Apache Spark_Apache Spark Sql_Text Files - Fatal编程技术网

Scala Apache Spark在一次运行中读取多个文本文件

scala apache-spark

Scala Apache Spark在一次运行中读取多个文本文件,scala,apache-spark,apache-spark-sql,text-files,Scala,Apache Spark,Apache Spark Sql,Text Files,我可以使用以下Apache Spark Scala代码成功地将文本文件加载到数据帧中： val df = spark.read.text("first.txt") .withColumn("fileName", input_file_name()) .withColumn("unique_id", monotonically_increasing_id()) 有没有办法在一次运行中提供多个文件？大概是这样的： val df = spark.read.text("first.txt,se

我可以使用以下Apache Spark Scala代码成功地将文本文件加载到数据帧中：

val df = spark.read.text("first.txt")
  .withColumn("fileName", input_file_name())
  .withColumn("unique_id", monotonically_increasing_id())

有没有办法在一次运行中提供多个文件？大概是这样的：

val df = spark.read.text("first.txt,second.txt,someother.txt")
  .withColumn("fileName", input_file_name())
  .withColumn("unique_id", monotonically_increasing_id())

现在，以下代码不适用于以下错误：

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:first.txt,second.txt,someother.txt;
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)

如何正确加载多个文本文件？

函数

spark.read.text（）

有一个varargs参数，来自：

def文本（路径：字符串*）：数据帧
这意味着要读取多个文件，只需将它们提供给以逗号分隔的函数，即
val df = spark.read.text("first.txt", "second.txt", "someother.txt")

函数spark.read.text（）
有一个varargs参数，来自：
def文本（路径：字符串*）：数据帧
这意味着要读取多个文件，只需将它们提供给以逗号分隔的函数，即
val df = spark.read.text("first.txt", "second.txt", "someother.txt")




[apache spark]相关文章推荐



                                                        
Apache spark 将kmeans模型注册为UDF
apache-spark 
Apache spark 简单火花应用中的错误
apache-spark 
Apache spark Spark：为什么任务只分配给一名员工？
apache-spark 
Apache spark Insert overwrite语句在spark sql中的运行速度比在配置单元客户端中慢得多
apache-sparkhive 
Apache spark 从Hadoop序列文件创建Spark RDD无效
apache-spark 
Apache spark spark：在具有更多列的单行上转置更多行
apache-spark 
Apache spark 使用spark访问配置单元数据
apache-sparkhive 
Apache spark 图形框架连接的组件性能
apache-spark 
Apache spark 为什么这个PySpark连接失败？
apache-sparkpyspark 
Apache spark 数据流解析JSON并保存到文本文件：SparkStreaming
apache-sparkapache-kafka 
Apache spark 在dataframe spark上执行操作时获取空指针异常
apache-spark 
Apache spark 如何在Spark SQL中将多个列分解为行
apache-spark 
Apache spark 火花修剪镶木地板柱
apache-spark 
Apache spark 如何在spark中替换漏掉的换行符
apache-sparkpyspark 
Apache spark 在结构化流媒体中接收writeStream中的旧窗口和重复窗口
apache-spark 
Apache spark 与配置单元相比，spark sql读取表的速度非常慢
apache-sparkhive 
Apache spark 在spark数据帧中使用forloop添加新列
apache-spark 
Apache spark Apache齐柏林飞艇如何计算Spark作业进度条？
apache-spark 
Apache spark 将RDD[InternalRow]保存到文件中，然后将其读回的正确方法是什么
apache-spark 
Apache spark Spark Streaming使套接字数据流没有更多数据，无法在初始化时连接到所需端口
apache-sparkpyspark 
                                       





随机文章推荐



                                                        
Monitoring 需要模拟路由器来测试SNMP监视器
monitoringsnmp 
Monitoring NewRelic'；s“可用性监控”监控CDN'；s的网站？
monitoring 
Monitoring Geneos数据提取
monitoring 
Monitoring OpenTSDB绘制内部统计数据
monitoring


                                        

                                        
                                        


                                                
                                                        [scala]相关推荐
                                                        
Scala 如何将用转义分隔的管道转换为用逗号分隔的管道
									Scala
							 									Csv
							 									Replace
							 
从scala调用java代码时出现NoSuchMethodError
									Scala
							 
sbaz是否与Scala 2.9.1.final配合使用？
									Scala
							 
当json可变时，使用scala lift json提取案例类
									Scala
							 
Scala函数中的参数列表。有人能解释一下代码吗？
									Scala
							 									Playframework
							 									Playframework 2.0
							 
Scala 如何重定向akka中的日志记录？
									Scala
							 									Logging
							 									Akka
							 
Scala Playframework NoSuchMethodError
									Scala
							 
Scala 在Jackson中使用计算的默认值而不是null
									Scala
							 
Scala 一个单子的多个平面图方法？
									Scala
							 									Haskell
							 
Scala 如何使用>=&燃气轮机；在斯卡拉？
									Scala
							 
从scala LinkedHashMap中删除最早的条目
									Scala
							 
在Scala中用泛型实现trait的正确方法是什么？
									Scala
							 									Generics
							 
（）的Scala与{}的Scala
									Scala
							 
Scala:类型不匹配
									Scala
							 
json4s scala.MatchError（属于scala.Tuple2类）
									Scala
							 
Scala Apache Spark SQL标识符应为异常
									Scala
							 									Sqlite
							 									Apache Spark
							 
Scala 在play 2.4和Slick 3中为单元测试设置内存db时遇到问题
									Scala
							 									Playframework
							 
Scala 如何从spark数据框中筛选出包含不可读字符的行
									Scala
							 									Apache Spark
							 									Dataframe
							 
Scala 如何让SBT仅重新运行失败的测试
									Scala
							 									Unit Testing
							 									Sbt
							 
Scala：筛选列表中的多个元素
									Scala
							 									Filter
							 									Functional Programming
							 
在scala中执行Curl命令
									Scala
							 									Curl
							 
Scala中的可伸缩性
									Scala
							 
Scala 如何展平Spark数据集中的嵌套字段？
									Scala
							 									Apache Spark
							 
spark scala中的配分函数
									Scala
							 									Apache Spark
							 
我如何从Scalaz的未来中获得价值？
									Scala
							 
避免两次指定模式（Spark/scala）
									Scala
							 									Apache Spark
							 
Scala 尝试声明新向量时出现类型不匹配错误
									Scala
							 									Class
							 
根据Spark Scala中的某个特定ID向数据帧添加值
									Scala
							 									Apache Spark
							 
在functional Scala中，将一种参数化类型转换为另一种类型的好方法是什么？
									Scala
							 									Functional Programming
							 
Scala 如何在JVM关闭时允许/等待Akka流完成？
									Scala
							 									Akka
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
D3.js
Webpack
Serial Port
Ionic Framework
Java Me
Django Models
Gitlab
Yii2
Junit
Zend Framework
Struts2
Flutter
Sublimetext3
Opengl Es
Racket
Webrtc
Concurrency
Biztalk
Character Encoding
Youtube Api
Xna
Antlr
C
Xamarin.android
Three.js
Groovy
Reference
Hadoop
Animation
Shiny
Angular6
Geometry
Networking
Oop
Aurelia
Arm
Jqgrid
Pdf
Sharepoint
Unix
Omnet++
Coffeescript
Vim
Exchange Server
Sockets
Asp.net Mvc 4
Azure Ad B2c
Editor
Anaconda
Fullcalendar
Prometheus
Ethereum
Button
Coq
Reflection
Computer Vision
Python 3.x
Firebase
Google Colaboratory
Dotnetnuke
Speech Recognition
Scheme
Adobe
F#
Google Chrome Devtools
Serialization
Mongoose
Selenium Webdriver
Sms
Django
Tabs
Web
Reporting Services
Emacs
Soap
Model View Controller
Crystal Reports
Intellij Idea
C# 4.0
Netbeans
Vb6
Grails
Zsh
Math
Firefox
Openlayers
Netlogo
Algorithm
Qt4
Nest
Function
Maps
Makefile
Titanium
Yocto
Smtp
Telegram
Asp.net Mvc
Automated Tests
Enums
Opencart
Aem
Dask
Itext
C++
Pandas
Blockchain
Twig
Database
Ssl
Heroku
Datatables
Jsf
Google Cloud Storage
Polymer
Android Ndk
Operating System
Ocaml
Datetime
Calendar
Vba
Actions On Google
Kendo Ui
Wolfram Mathematica
Checkbox
Javafx 2
Lisp
Symfony
Twitter
Next.js
Webgl
Alfresco
Sass
Seo
Programming Languages
Prolog
Cucumber
Extjs
Postgresql
Mfc
Sed
Certificate
Codenameone
Safari
Ide
Jar
Sdk
Grep
Xquery
Visual Studio 2015
Web Scraping
Xcode4
Cors
Influxdb
Webview
Triggers
Ibm Mq
Visual Studio
Flask
Macros
Debian
Eclipse Rcp
Parameters
Selenium
Scripting
Scroll
Windows Phone 8
Configuration
Powerbi
Jpa
Mqtt
Entity Framework 4
C++ Cli
Version Control
Bots
Binding
Google Sheets
Vbscript
Microsoft Graph Api
Fiware
Google Chrome Extension
Gridview
Wcf
Cuda
Protractor
Sphinx
Methods
Loopbackjs
.net Core
Time Complexity
Jersey
Json
Oauth
Linq To Sql
Discord.py
Xpages
Join
Testng
Stored Procedures
Botframework
Tcl


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网