如何使用Spark'；在Scala中返回多个键值对；s映射变换？_Scala_Apache Spark_Scala Collections

如何使用Spark'；在Scala中返回多个键值对；s映射变换？

scala apache-spark

如何使用Spark'；在Scala中返回多个键值对；s映射变换？,scala,apache-spark,scala-collections,Scala,Apache Spark,Scala Collections,我是Scala和Spark的新手。我试图在映射转换期间返回多个键值对。我的输入数据是一个简单的CSV文件 1, 2, 3 4, 5, 6 7, 8, 9 1, 2, 3 4, 5, 6 7, 8, 9 我的Scala脚本如下所示 class Key(_i:Integer, _j:Integer) { def i = _i def j = _j } class Val(_x:Double, _y:Double) { def x = _x def y = _y } val arr = "1,

我是Scala和Spark的新手。我试图在映射转换期间返回多个键值对。我的输入数据是一个简单的CSV文件

1, 2, 3 4, 5, 6 7, 8, 9 1, 2, 3 4, 5, 6 7, 8, 9 我的Scala脚本如下所示

class Key(_i:Integer, _j:Integer) {
 def i = _i
 def j = _j
}
class Val(_x:Double, _y:Double) {
 def x = _x
 def y = _y
}
val arr = "1,2,3".split(",")
for(i <- 0 until arr.length) {
 val x = arr(i).toDouble
 for(j <- 0 until arr.length) {
  val y = arr(j).toDouble
  val k = new Key(i, j)
  val v = new Val(x, y)
  //note that i want to return the tuples, (k, v)
 }
}

类键（_i:Integer，_j:Integer）{
定义i=_i
def j=_j
}
类值（_x:Double，_y:Double）{
def x=_x
定义y=_y
}
val arr=“1,2,3”。拆分（“，”）
对于（i你忘记了箭头后面的括号。只有当它是一个简单的表达式（一个表达式）时，你才能省略它们
编辑后的完整答案：
case class Index(i:Integer, j:Integer)
case class Val(x:Double, y:Double)

val data = sc.parallelize(List("1,2,3", "4,5,6", "7,8,9"))
data.flatMap(line=>{
val arr = line.split(",")
 val doubleSeq = for(i <- 0 until arr.length) yield {
  val x = arr(i).toDouble
  for(j <- (i+1) until arr.length) yield {
   val y = arr(j).toDouble
   val k = Index(i,j)
   val v = Val(x,y)
   (k,v)
  }
 }
 doubleSeq.flatten
})

case类索引（i:Integer，j:Integer）
案例等级Val（x:Double，y:Double）
val data=sc.parallelize（列表（“1,2,3”、“4,5,6”、“7,8,9”））
data.flatMap（行=>{
val arr=行分割（“，”）
val doubleSeq=for（i使用RDD.flatMap
和产生for
循环中的列表：
val file = sc.textFile("/path/to/test.csv")
file.flatMap { line =>
  val arr = line.split(",")
  for {
    i <- 0 until arr.length
    j <- (i + 1) until arr.length
  } yield {
    val x = arr(i).toDouble
    val y = arr(j).toDouble
    val k = new Index(i, j)
    val v = new Val(x, y)
    (k, v)
  }
}.collect

val file=sc.textFile（“/path/to/test.csv”）
file.flatMap{line=>
val arr=行分割（“，”）
为了{
我知道你的建议有帮助。现在错误消失了。但是当我添加return语句时，return（k，v），我得到以下结果：错误：返回方法定义之外。我没有看到…不要在scala中返回，最后的语句是返回值。这会解决问题的。我想你知道如何检查lambda函数是否正确吗？当我执行file.map（line=>{…}）。collect时，我看到的只是数组[Unit]=Array（（），（），…）。我接下来要做的是用同一个键减少所有值。但是，autocomplete（点击tab）表明reduceByKey不是org.apache.spark.rdd.rdd[Unit]的成员。我仍然停留在MapReduce的思维状态中。我发布了在您的帮助下现在可以工作的代码。请注意，在上面的示例中，我使用collect尝试检查RDD中的实际内容。同时，我正在阅读这篇文章，它似乎建议Scala/Spark中的map函数有1个输入和1个输出，因为我想做的事情，我可能必须使用flatMap函数。是的，flatMap似乎是正确的。与您的代码不完全相同，但这个问题还使用flatMap从每个输入行生成多个输出行。它可能会为您指出正确的方向？Scalafor循环非常神奇。我从未找到它们的文档，目前为止我不敢问。
file.map(line => {
    //multiple lines of code here
})

case class Index(i:Integer, j:Integer)
case class Val(x:Double, y:Double)

val data = sc.parallelize(List("1,2,3", "4,5,6", "7,8,9"))
data.flatMap(line=>{
val arr = line.split(",")
 val doubleSeq = for(i <- 0 until arr.length) yield {
  val x = arr(i).toDouble
  for(j <- (i+1) until arr.length) yield {
   val y = arr(j).toDouble
   val k = Index(i,j)
   val v = Val(x,y)
   (k,v)
  }
 }
 doubleSeq.flatten
})

val file = sc.textFile("/path/to/test.csv")
file.flatMap { line =>
  val arr = line.split(",")
  for {
    i <- 0 until arr.length
    j <- (i + 1) until arr.length
  } yield {
    val x = arr(i).toDouble
    val y = arr(j).toDouble
    val k = new Index(i, j)
    val v = new Val(x, y)
    (k, v)
  }
}.collect