Algorithm 使用apachespark-Graphx的广度优先搜索算法

Algorithm 使用apachespark-Graphx的广度优先搜索算法,algorithm,scala,apache-spark,breadth-first-search,spark-graphx,Algorithm,Scala,Apache Spark,Breadth First Search,Spark Graphx,我正在尝试使用ApacheSparkGraphx实现BFS(广度优先搜索)算法 这是我当前的实现: object BFSAlgorithm { def run(graph: Graph[VertexId, Int], sourceVertex: VertexId): Graph[Int, Int] = { val bfsGraph: Graph[Int, Int] = graph.mapVertices((vertex, _) => if (vertex ==

我正在尝试使用ApacheSparkGraphx实现BFS(广度优先搜索)算法

这是我当前的实现:

object BFSAlgorithm {

  def run(graph: Graph[VertexId, Int], sourceVertex: VertexId): Graph[Int, Int] = {

    val bfsGraph: Graph[Int, Int] = graph.mapVertices((vertex, _) =>
      if (vertex == sourceVertex) {
        0
      } else {
        Int.MaxValue
      }
    )

    var queue: Queue[VertexId] = Queue[VertexId](sourceVertex)
    while(queue.nonEmpty){
      val currentVertexId = queue.dequeue()
      val neighbours: RDD[EdgeTriplet[Int, Int]] = bfsGraph.triplets.filter(_.srcId == currentVertexId)
      for(triplet <- neighbours){
        if(triplet.dstAttr == Int.MaxValue){
          queue += triplet.dstId
        }
        val distance = triplet.srcAttr + 1
        if(distance < triplet.dstAttr){
          // Update vertex attibute
          bfsGraph.mapVertices((vertex, _) => if(vertex == triplet.dstId) distance else triplet.dstAttr)
        }
      }
    }
    bfsGraph
  }

}
我很困惑,因为for循环
bfsGraph.vertices
为空


谁能解释一下原因吗?更新图形中顶点属性的最佳方法是什么?

此实现无法工作,因为您试图访问另一个RDD中的RDD。当您在邻居上调用for循环时,GraphX尝试为循环收集一个闭包,该闭包包含其主体内所需变量的列表,在本例中,该闭包涉及另一个RDD(bsfGraph),从而导致
NullPointerException

bfsGraph.mapVertices((vertex, _) => if(vertex == triplet.dstId) distance else triplet.dstAttr)