Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala mkString和sortByKey不使用Spark中的数组_Scala_Apache Spark_Bigdata - Fatal编程技术网

Scala mkString和sortByKey不使用Spark中的数组

Scala mkString和sortByKey不使用Spark中的数组,scala,apache-spark,bigdata,Scala,Apache Spark,Bigdata,我有一个日志文件(帐户),其中包含以下数据: 1,2008-10-23 16:05:05.0,\N,Donald,Becton,2275 Washburn Street,Oakland,CA,94660,5100032418,2014-03-18 13:29:47.0,2014-03-18 13:29:47.0 2,2008-11-12 03:00:01.0,\N,Donna,Jones,3885 Elliott Street,San Francisco,CA,94171,4150835799,

我有一个日志文件(帐户),其中包含以下数据:

1,2008-10-23 16:05:05.0,\N,Donald,Becton,2275 Washburn Street,Oakland,CA,94660,5100032418,2014-03-18 13:29:47.0,2014-03-18 13:29:47.0
2,2008-11-12 03:00:01.0,\N,Donna,Jones,3885 Elliott Street,San Francisco,CA,94171,4150835799,2014-03-18 13:29:47.0,2014-03-18 13:29:47.0
1-我使用以下方法获取日志文件:

val accountsdata = sc.textFile("C:/Users/Sam/Downloads/account1.txt")
2-我想按邮政编码/邮政编码登记主要客户,因此我做了以下工作:
val accountsByPCode=accountsdata.keyBy(line=>line.split(“,”)(8)).mapValues(line=>line.split(“,””)
-->

3-然后我想将AccountsByCode映射到lastname,firstname作为值,并使用以下方法实现:
val namesByPCode=accountsByPCode.mapValues(fields=>(fields(3),fields(4))).collect()
-->这也可以正常工作,但当我尝试使用以下方法打印它时:

println(s"======= namesByPCode, style1 =======")
 for (pair <- namesByPCode.take(5)) {
  printf("%s, [%s] \n",pair._1,pair._2.mkString(","))
 }
另外,当我尝试使用以下工具来sortByKey时:

println(s"======= namesByPCode, style2 =======")
 for (pair <- namesByPCode.sortByKey().take(5)) {
  println("---" + pair._1)
  pair._2.take(3)foreach(println) 
}
println(s“===namesByPCode,style2===”)

对于(pair而言,这是因为您创建的是
Tuple2[String,String]
而不是
数组[String]
。请尝试:

val namesByPCode = accountsByPCode.mapValues(fields => Array(fields(3), fields(4))).collect()
或者将拾取代码的位置更改为:

printf("%s, [%s] \n",pair._1,Array(pair._2._1, pair._2._2).mkString(","))
做其中一个(不要两个都做!)。

你会在

error: value sortByKey is not a member of Array[(String, (String,String))]
  for (pair <- namesByPCode.sortByKey().take(5)) {
由于您调用了
collect()
,因此不再有RDD。您使用的是数组。您只需要在收集之前按键对数据进行排序

val namesByPCode = accountsByPCode.mapValues(fields => (fields(3),  fields(4))).sortByKey().collect()

现在您有了一个排序数组。如果您不需要整个数组,您应该将
collect()
替换为
take(5)

现在mkString正在工作,但是sortByKey给了我这个错误::39:error:value sortByKey不是数组[(String,array[String])的成员
error: value sortByKey is not a member of Array[(String, (String,String))]
  for (pair <- namesByPCode.sortByKey().take(5)) {
val namesByPCode = accountsByPCode.mapValues(fields => (fields(3),  fields(4))).collect()
val namesByPCode = accountsByPCode.mapValues(fields => (fields(3),  fields(4))).sortByKey().collect()