Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 对RDD的其余部分执行一个元素的操作_Scala_Apache Spark_Rdd - Fatal编程技术网

Scala 对RDD的其余部分执行一个元素的操作

Scala 对RDD的其余部分执行一个元素的操作,scala,apache-spark,rdd,Scala,Apache Spark,Rdd,我是spark的新手,我真的很享受这项技术提供的可能性。我的问题是如何在不使用for循环的情况下,在RDD的其余部分上为每个元素执行一个元素操作。 下面是我对for循环的尝试: //RDD[Key:Int,Vector:(Double,Double)] val rdd = data.map(x => (x.split(',')(0).toInt,Vectors.dense(x.split(',')(1).toDouble,x.split(',')(2).toDouble))) fo

我是spark的新手,我真的很享受这项技术提供的可能性。我的问题是如何在不使用for循环的情况下,在RDD的其余部分上为每个元素执行一个元素操作。 下面是我对for循环的尝试:

 //RDD[Key:Int,Vector:(Double,Double)]
 val rdd = data.map(x => (x.split(',')(0).toInt,Vectors.dense(x.split(',')(1).toDouble,x.split(',')(2).toDouble)))

 for( ind <- 0 to rdd.count().toInt -1 ) {
   val element1 = rdd.filter(x => x._1 == ind)
   val vector1 = element1.first()._2
   val rdd2 = rdd.map( x => {
        var dist1 = Vectors.sqdist(x._2,vector1)    
        (x._1 , Math.sqrt(dist1))
        })
 }
//RDD[Key:Int,Vector:(Double,Double)]
val rdd=data.map(x=>(x.split(',')(0).toInt,Vectors.densite(x.split(',')(1.toDouble,x.split(',')(2.toDouble)))
对于(ind x.。_1==ind)
val vector1=element1.first()
val rdd2=rdd.map(x=>{
var dist1=Vectors.sqdist(x._2,vector1)
(x._1,数学sqrt(区1))
})
}

感谢您的建议

如果您要查找所有向量之间的距离,请使用
rdd.cartesian

import org.apache.spark.mllib.linalg.Vectors

val rdd = sc.parallelize(Array("0,1.0,1.0","1,2.0,2.0","2,3.0,3.0"))
val r = rdd.map(x => x.split(","))
           .map(y =>(y(0).toInt, Vectors.dense(y(1).toDouble, y(2).toDouble)))

val res =  r.cartesian(r).map{ case (first, second) => 
   ((first._1, second._1), 
    Math.sqrt(Vectors.sqdist(first._2, second._2))) 
}
然而,它计算相同向量之间的距离,两次。(先(A,B)然后(B,A))