Apache spark 删除子图GraphX的未连接组件

Apache spark 删除子图GraphX的未连接组件,apache-spark,spark-graphx,Apache Spark,Spark Graphx,我有以下图表: // Vertices val usersTest: RDD[(VertexId, (String))] = sc.parallelize(Array((1L, ("AAA")), (2L, ("BBB")), (3L, ("CCC")))) // Edges val relationshipsTest: RDD[Edge[Int]] = sc.parallelize(Array(Edge(1L, 3L, 1),Edge(1L, 3L, 1),Edge(1L, 2L, 3), E

我有以下图表:

// Vertices
val usersTest: RDD[(VertexId, (String))] = sc.parallelize(Array((1L, ("AAA")), (2L, ("BBB")), (3L, ("CCC"))))
// Edges
val relationshipsTest: RDD[Edge[Int]] = sc.parallelize(Array(Edge(1L, 3L, 1),Edge(1L, 3L, 1),Edge(1L, 2L, 3), Edge(2L, 1L, 1), Edge(2L, 1L, 2), Edge(2L, 3L, 1),   Edge(3L, 2L, 2)))
val defaultUserTest =  "Missing"
//Creating the Graph
val graphTest = Graph(usersTest, relationshipsTest, defaultUserTest)
将生成以下输出:

(graphTest.numEdges, graphTest.numVertices)
res: (Long, Long) = (7,3)
现在,当我尝试使用子图时:

val validGraphTest = graphTest.subgraph(epred = e => e.attr > 2) 
我获得:

( validGraphTest.numEdges, validGraphTest.numVertices)
res: (Long, Long) = (1,3)
我想要的是删除未连接的顶点(即,在本例中,由于只剩下一条边,因此所需的输出将是
res:(Long,Long)=(1,2)

我试过了

val validCCGraphTest = validGraphTest.connectedComponents()
但是
(validcgraphtest.numEdges,validcgraphtest.numvitices)


仍然生成
res:(Long,Long)=(1,3)

零度的孤立顶点是大小为1的连接组件。这就是你的方法不起作用的原因。您可以尝试以下方法:

validGraphTest
.外部连接顶点(validGraphTest.度){
案例(vd,某些(x))=>(vd,x)
案例(vd,vd)=>(vd,0)
}
.子图(vpred={case(,(,x))=>x>0})
.mapVertices{case(,(x,))=>x}
或者更简洁一点(尽管看起来效率较低):

Graph(validGraphTest.degrees,validGraphTest.edges)。掩码(graphTest)

零度孤立顶点是大小为1的连接组件。这就是你的方法不起作用的原因。您可以尝试以下方法:

validGraphTest
.外部连接顶点(validGraphTest.度){
案例(vd,某些(x))=>(vd,x)
案例(vd,vd)=>(vd,0)
}
.子图(vpred={case(,(,x))=>x>0})
.mapVertices{case(,(x,))=>x}
或者更简洁一点(尽管看起来效率较低):

Graph(validGraphTest.degrees,validGraphTest.edges)。掩码(graphTest)