Scala 如何将RDD[List[String]]转换为String并拆分它

Scala 如何将RDD[List[String]]转换为String并拆分它,scala,hadoop,apache-spark,Scala,Hadoop,Apache Spark,我有下面的场景,当我需要从列表中获取行并拆分它时 scala> var nonErroniousBidsMap = rawBids.filter(line => !(line(2).contains("ERROR_") || line(5) == null || line(5) == "")) nonErroniousBidsMap: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[108] at filter at

我有下面的场景,当我需要从列表中获取行并拆分它时

scala> var nonErroniousBidsMap = rawBids.filter(line => !(line(2).contains("ERROR_") || line(5) == null || line(5) == ""))
nonErroniousBidsMap: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[108] at filter at <console>:33

scala> nonErroniousBidsMap.take(2).foreach(println)
List(0000002, 15-04-08-2016, 0.89, 0.92, 1.32, 2.07, , 1.35)
List(0000002, 11-05-08-2016, 0.92, 1.68, 0.81, 0.68, 1.59, , 1.63, 1.77, 2.06, 0.66, 1.53, , 0.32, 0.88, 0.83, 1.01)

scala> val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + "," + rec.split(",")(1) + ",US" + "," + rec.split(",")(5) ) )
<console>:35: error: value split is not a member of List[String]
     val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + "," + rec.split(",")(1) + ",US" + "," + rec.split(",")(5) ) )
                                                              ^
scala>var noneroniousbidsmap=rawBids.filter(行=>!(行(2).包含(“错误”)||行(5)==null |行(5)==“”)
nonErroniousBidsMap:org.apache.spark.rdd.rdd[List[String]=MapPartitionsRDD[108]位于33处的筛选器
scala>noneroniousbidsmap.take(2.foreach)(println)
清单(0000002,15-04-08-2016,0.89,0.92,1.32,2.07,1.35)
列表(0000002,11-05-08-2016,0.92,1.68,0.81,0.68,1.59,1.63,1.77,2.06,0.66,1.53,0.32,0.88,0.83,1.01)
scala>val transportsemap=noneroniousbidsmap.map(rec=>(rec.split(“,”)(0)+“,”+rec.split(“,”)(1)+“,”US“+”,“+rec.split(“,”)(5)))
:35:错误:值拆分不是列表[字符串]的成员
val TransportSeMap=noneroniousBidsMap.map(rec=>(rec.split(“,”(0)+“,”+rec.split(“,”)(1)+“,”美国“+”,“+rec.split(“,”)(5)))
^
我得到一个错误,如上所示。 你能帮我解决这个问题吗


谢谢。

rec的类型是
列表[String]
——它没有
拆分(String)
方法(正如编译器正确警告的那样)。看起来您假设您的记录是逗号分隔的字符串,但实际上并非如此(当您对每一个记录调用
println
时,它们都使用逗号分隔符打印,因为
List.toString
就是这样工作的)

您只需删除对
split(“,”
的所有调用,即可获得所需的:

nonErroniousBidsMap.map(rec => rec.head + "," + rec(1) + ",US" + "," + rec(5))
或者更优雅地使用Scala的字符串插值:

nonErroniousBidsMap.map(rec => s"${rec.head},${rec(1)},US,${rec(5)}")

请将代码放在代码格式中,以便于阅读。如果你不知道怎么做,你可以突出显示代码,然后按Cmd+K或Ctrl+K。非常感谢你,你救了我一天…非常感谢你…上面的rec项目我映射到一个名为Bid的案例类,下面的每个项目都保存着投标记录,即transUS、transMX和transCA……。我什么时候从我的方法RDD[Bid]返回…但当return::val transAll=transUS.union(transMX).union(transCA)…时,它给出了一个错误“发现类型不匹配:机组需要RDD[Bid]”,那么如何从上述union返回RDD[Bid]???先生,我有类似的问题,你能帮我吗。