Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Scala将特定格式的文本文件转换为Spark中的数据帧_Scala_Apache Spark - Fatal编程技术网

使用Scala将特定格式的文本文件转换为Spark中的数据帧

使用Scala将特定格式的文本文件转换为Spark中的数据帧,scala,apache-spark,Scala,Apache Spark,我试图通过Scala将对话转换为spark中的数据帧。人员及其消息由制表符空格长度分隔。每段对话都换了一行 文本文件如下所示: alpha hello,beta! how are you? beta I am fine alpha.How about you? alpha I am also doing fine... alpha Actually, beta, I am bit busy nowadays and sorry I hadn't call U --------

我试图通过Scala将对话转换为spark中的数据帧。人员及其消息由制表符空格长度分隔。每段对话都换了一行

文本文件如下所示:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U
------------------------------------
|Person  |  Message
------------------------------------
|1       |  hello,beta! how are you?
|2       |  I am fine alpha.How about you?
|1       |  I am also doing fine...
|1       |  Actually, beta, I am bit busy nowadays and sorry I hadn't call 
-------------------------------------
我需要数据帧,如下所示:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U
------------------------------------
|Person  |  Message
------------------------------------
|1       |  hello,beta! how are you?
|2       |  I am fine alpha.How about you?
|1       |  I am also doing fine...
|1       |  Actually, beta, I am bit busy nowadays and sorry I hadn't call 
-------------------------------------

如果您读取文本文件并对其进行分析:

例如:

   val result: Dataset[(String, String)] = sparkSession.read.textFile("filePath").flatMap {
      line =>
        val str = line.split("\t")
        if (str.length == 2) {
          Some((str(0), str(1)))
        }
        else {
          //in case if you want to ignore malformed line
          None
        }
    }

首先,我用您提供的数据创建了一个文本文件,并将其放在temp/data.txt下的HDFS位置

data.txt:

alpha   hello,beta! how are you?
beta    I am fine alpha.How about you?
alpha   I am also doing fine...
alpha   Actually, beta, I am bit busy nowadays and sorry I hadn't call U
然后,我创建了一个case类,读入该文件,并将其处理为一个数据帧:

case类PersonMessage(Person:String,Message:String)
val df=sc.textFile(“temp/data.txt”).map(x=>{
val splits=x.split(“\t”)
PersonMessage(拆分(0),拆分(1))
}).toDF(“人”、“信息”)
df.show

你能分享你的代码吗?我实际上是scala的初学者,我只是在这方面有了一些进展。我现在正在学习复杂的映射函数,就像这个问题一样val text=sc.textFile(“hdfs://localhost:9000/Conversation“”.map(x=>x.split(“\n”)val text2=text.foreach(x=>x.map(y=>y.split(“”))```