Apache spark 如何用GraphX计算邻域的平均度

Apache spark 如何用GraphX计算邻域的平均度,apache-spark,spark-graphx,Apache Spark,Spark Graphx,我想计算图中每个节点的平均邻居度。假设我们有这样一个图: val users: RDD[(VertexId, String)] = sc.parallelize(Array((3L, "rxin"), (7L, "jgonzal"), (5L, "franklin"), (2L, "istoi

我想计算图中每个节点的平均邻居度。假设我们有这样一个图:

 val users: RDD[(VertexId, String)] = 
         sc.parallelize(Array((3L, "rxin"), 
                              (7L, "jgonzal"),
                              (5L, "franklin"), 
                              (2L, "istoica")))
// Create an RDD for edges
val relationships: RDD[Edge[Int]] = sc.parallelize(
                    Array(Edge(3L, 7L, 12),
                          Edge(5L, 3L, 1),
                          Edge(2L, 5L, 3), 
                          Edge(5L, 7L, 5)))
// Build the initial Graph
val graph = Graph(users, relationships)
编辑 要了解结果,以节点5及其邻居为例:

  • 节点3,其度=2
  • 节点7,其度=2
  • 节点2,其度=1
该度量的输出只是节点5的邻居的平均度:(2+2+1)/3=1.666

理想情况下,您希望在此计算中删除与节点5的链接,但现在这对我来说并不重要

结束编辑

我正在尝试应用aggregateMessages,但我不知道如何在进入aggregateMessages调用时检索每个节点的度:

val neideg = g.aggregateMessages[(Long, Double)](
    triplet => {
      val comparedAttrs = compareAttrs(triplet.dstAttr, triplet.srcAttr) // BUT HERE I SHOULD GIVE ALSO THE DEGREE
      triplet.sendToDst(1L, comparedAttrs)
      triplet.sendToSrc(1L, comparedAttrs)
    },
    { case ((cnt1, v1), (cnt2, v2)) => (cnt1 + cnt2, v1 + v2) })

val aveneideg = neideg.mapValues(kv => kv._2 / kv._1.toDouble).toDF("id", "aveneideg")
然后我有一个函数,可以求和:

def compareAttrs(xs: (Int, String), ys: (Int, String)): Double = {
    xs._1.toDouble + ys._1.toDouble
}
如何将这些节点的度值传递给ComparedAttr


当然,与我正在尝试的解决方案相比,我非常高兴看到这项任务有一个更简单、更智能的解决方案……

我不清楚这是否是您想要的,但这是您可以选择的:

val degrees = graph.degrees
// now we have a graph where attribute is a degree of a vertex
val graphWithDegrees = graph.outerJoinVertices(degrees) { (_, _, optDegree) =>
    optDegree.getOrElse(1)    
}

// now each vertex sends its degree to its neighbours
// we aggregate them in a set where each vertex gets all values
// of its neighbours
val neighboursDegreeAndCount = graphWithDegrees.aggregateMessages[List[Long]](
    sendMsg = triplet => {
        val srcDegree = triplet.srcAttr
        val dstDegree = triplet.dstAttr
        triplet.sendToDst(List(srcDegree))
        triplet.sendToSrc(List(dstDegree))
    },
    mergeMsg = (x, y) => x ++ y
).mapValues(degrees => degrees.sum / degrees.size.toDouble)

// now if you want it in the original graph you can do
// outerJoinVertices again, and now the attr of vertex 
// in the graph is avg of its neighbours
graph.outerJoinVertices(neighboursDegreeAndCount) { (_, _, optAvgDegree) =>
    optAvgDegree.getOrElse(1)
}

因此,对于您的示例,输出是:
数组((5,1.66666 7),(2,3.0),(3,2.5),(7,2.5))
我不清楚您是否在追求这个,但这是您可以选择的:

val degrees = graph.degrees
// now we have a graph where attribute is a degree of a vertex
val graphWithDegrees = graph.outerJoinVertices(degrees) { (_, _, optDegree) =>
    optDegree.getOrElse(1)    
}

// now each vertex sends its degree to its neighbours
// we aggregate them in a set where each vertex gets all values
// of its neighbours
val neighboursDegreeAndCount = graphWithDegrees.aggregateMessages[List[Long]](
    sendMsg = triplet => {
        val srcDegree = triplet.srcAttr
        val dstDegree = triplet.dstAttr
        triplet.sendToDst(List(srcDegree))
        triplet.sendToSrc(List(dstDegree))
    },
    mergeMsg = (x, y) => x ++ y
).mapValues(degrees => degrees.sum / degrees.size.toDouble)

// now if you want it in the original graph you can do
// outerJoinVertices again, and now the attr of vertex 
// in the graph is avg of its neighbours
graph.outerJoinVertices(neighboursDegreeAndCount) { (_, _, optAvgDegree) =>
    optAvgDegree.getOrElse(1)
}

因此,对于您的示例,输出是:
Array((5,1.66666 7),(2,3.0),(3,2.5),(7,2.5))

我会对每个连接的组件进行DFS,并跟踪沿途的邻居。然后将这个数字除以节点的数量。谢谢您的时间,但是这种回复并没有真正的帮助。我会对每个连接的组件进行DFS,并在途中跟踪邻居。然后将这个数字除以节点的数量。谢谢你的时间,但是这种回复并没有真正的帮助。我不理解节点5的结果,为什么平均度是1.5而不是1.666?谢谢@我已经编辑了这个问题,以便更好地解释我想要的accomplish@user299791对不起,这是我代码中的一个愚蠢错误。我使用了
Set
,我的意思是
List
。再次检查代码。我认为代码中存在另一个问题,对于孤立节点,它根本不报告值…@user299791您必须执行
getOrElse
,而不是
get
,我将更新代码。我不理解节点5的结果,为什么平均度数是1.5而不是1.666?谢谢@我已经编辑了这个问题,以便更好地解释我想要的accomplish@user299791对不起,这是我代码中的一个愚蠢错误。我使用了
Set
,我的意思是
List
。再次检查代码。我认为代码中存在另一个问题,对于孤立节点,它根本不报告值…@user299791您必须执行
getOrElse
而不是
get
,我将更新代码。