Algorithm 使用apachespark-Graphx的广度优先搜索算法
我正在尝试使用ApacheSparkGraphx实现BFS(广度优先搜索)算法 这是我当前的实现:Algorithm 使用apachespark-Graphx的广度优先搜索算法,algorithm,scala,apache-spark,breadth-first-search,spark-graphx,Algorithm,Scala,Apache Spark,Breadth First Search,Spark Graphx,我正在尝试使用ApacheSparkGraphx实现BFS(广度优先搜索)算法 这是我当前的实现: object BFSAlgorithm { def run(graph: Graph[VertexId, Int], sourceVertex: VertexId): Graph[Int, Int] = { val bfsGraph: Graph[Int, Int] = graph.mapVertices((vertex, _) => if (vertex ==
object BFSAlgorithm {
def run(graph: Graph[VertexId, Int], sourceVertex: VertexId): Graph[Int, Int] = {
val bfsGraph: Graph[Int, Int] = graph.mapVertices((vertex, _) =>
if (vertex == sourceVertex) {
0
} else {
Int.MaxValue
}
)
var queue: Queue[VertexId] = Queue[VertexId](sourceVertex)
while(queue.nonEmpty){
val currentVertexId = queue.dequeue()
val neighbours: RDD[EdgeTriplet[Int, Int]] = bfsGraph.triplets.filter(_.srcId == currentVertexId)
for(triplet <- neighbours){
if(triplet.dstAttr == Int.MaxValue){
queue += triplet.dstId
}
val distance = triplet.srcAttr + 1
if(distance < triplet.dstAttr){
// Update vertex attibute
bfsGraph.mapVertices((vertex, _) => if(vertex == triplet.dstId) distance else triplet.dstAttr)
}
}
}
bfsGraph
}
}
我很困惑,因为for循环bfsGraph.vertices
为空
谁能解释一下原因吗?更新图形中顶点属性的最佳方法是什么?此实现无法工作,因为您试图访问另一个RDD中的RDD。当您在邻居上调用for循环时,GraphX尝试为循环收集一个闭包,该闭包包含其主体内所需变量的列表,在本例中,该闭包涉及另一个RDD(bsfGraph),从而导致
NullPointerException
bfsGraph.mapVertices((vertex, _) => if(vertex == triplet.dstId) distance else triplet.dstAttr)