Scala 修改spark GraphX pageRank以执行带重新启动的随机漫游

Scala 修改spark GraphX pageRank以执行带重新启动的随机漫游,scala,apache-spark,pagerank,random-walk,spark-graphx,Scala,Apache Spark,Pagerank,Random Walk,Spark Graphx,我试图通过修改PageRank算法的Spark GraphX实现来实现带重启的随机行走 def randomWalkWithRestart(graph: Graph[VertexProperty, EdgeProperty], patientID: String , numIter: Int = 10, alpha: Double = 0.15, tol: Double = 0.01): Unit = { var rankGraph: Graph[Double, Double] = gra

我试图通过修改PageRank算法的Spark GraphX实现来实现带重启的随机行走

  def randomWalkWithRestart(graph: Graph[VertexProperty, EdgeProperty], patientID: String , numIter: Int = 10, alpha: Double = 0.15, tol: Double = 0.01): Unit = {

var rankGraph: Graph[Double, Double] = graph
  // Associate the degree with each vertex
  .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
  // Set the weight on the edges based on the degree
  .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
  // Set the vertex attributes to the initial pagerank values
  .mapVertices( (id, attr) => alpha )

var iteration = 0
var prevRankGraph: Graph[Double, Double] = null
while (iteration < numIter) {
  rankGraph.cache()

  // Compute the outgoing rank contributions of each vertex, perform local preaggregation, and
  // do the final aggregation at the receiving vertices. Requires a shuffle for aggregation.
  val rankUpdates = rankGraph.aggregateMessages[Double](
    ctx => ctx.sendToDst(ctx.srcAttr * ctx.attr), _ + _, TripletFields.Src)

  // Apply the final rank updates to get the new ranks, using join to preserve ranks of vertices
  // that didn't receive a message. Requires a shuffle for broadcasting updated ranks to the
  // edge partitions.
  prevRankGraph = rankGraph
  rankGraph = rankGraph.joinVertices(rankUpdates) {
    (id, oldRank, msgSum) => alpha + (1.0 - alpha) * msgSum
  }.cache()

  rankGraph.edges.foreachPartition(x => {}) // also materializes rankGraph.vertices
  //logInfo(s"PageRank finished iteration $iteration.")
  prevRankGraph.vertices.unpersist(false)
  prevRankGraph.edges.unpersist(false)

  iteration += 1

}
def randomWalkWithRestart(图形:图形[VertexProperty,EdgeProperty],patientID:String,numIter:Int=10,alpha:Double=0.15,tol:Double=0.01):单位={
变量rankGraph:Graph[Double,Double]=Graph
//将度与每个顶点关联
.outerjoin顶点(图形outDegrees){(vid,vdata,deg)=>deg.getOrElse(0)}
//基于度设置边上的权重
.mapTriplets(e=>1.0/e.srcAttr,TripletFields.Src)
//将顶点属性设置为初始pagerank值
.mapVertices((id,attr)=>alpha)
var迭代=0
var prevRankGraph:Graph[Double,Double]=null
while(迭代ctx.sendToDst(ctx.srcAttr*ctx.attr),+\u0,TripletFields.Src)
//应用最终秩更新以获得新秩,使用“连接”保留顶点的秩
//未收到消息。需要洗牌才能将更新的列组广播到
//边缘分区。
prevRankGraph=rankGraph
rankGraph=rankGraph.joinVertices(RankUpdate){
(id,oldRank,msgSum)=>alpha+(1.0-alpha)*msgSum
}.cache()
foreachPartition(x=>{})//也具体化rankGraph.vertices
//登录信息(s“PageRank完成迭代$iteration.”)
prevRankGraph.vertices.unpersist(false)
prevRankGraph.edges.unpersist(false)
迭代次数+=1
}
}

我认为
(id,oldRank,msgSum)=>alpha+(1.0-alpha)*msgSum
部分应该更改,但我不确定如何更改。我需要把就绪状态概率加到这行

此外,就绪状态概率应该在
while
循环之前的某个地方初始化。准备状态概率必须上传到
while
循环中


如有任何建议,将不胜感激

你想做一个个性化的pagerank吗?我不确定修改pagerank是否是最好的实现,因为它会传播到每个连接的邻居,而不是“漫游”图形。您可以像ConnectedComponents那样沿路径传播标签。你只需要让一个顶点随机选择它的一个邻居来发送一个正值,而另一个要么没有收到消息,要么被传递为零。Spark的一个问题是,它总是在整个图形中运行,因此从一个顶点进行行走是一个挑战。