Scala 如何使用火花图'；s函数掩码？_Scala_Graph_Apache Spark_Spark Dataframe_Spark Graphx

Scala 如何使用火花图'；s函数掩码？

scala graph apache-spark

Scala 如何使用火花图'；s函数掩码？,scala,graph,apache-spark,spark-dataframe,spark-graphx,Scala,Graph,Apache Spark,Spark Dataframe,Spark Graphx,我想检查一个新的图（称为a）是否是另一个图（称为B）的子图。我为测试写了一个小演示，但失败了！我只在spark shell上运行演示，spark版本1.6.1： // Build the GraphB val usersB = sc.parallelize(Array( (3L, ("rxin", "student")), (7L, ("jgonzal","postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof

我想检查一个新的图（称为a）是否是另一个图（称为B）的子图。我为测试写了一个小演示，但失败了！我只在spark shell上运行演示，spark版本1.6.1：

// Build the GraphB
val usersB = sc.parallelize(Array(
  (3L, ("rxin", "student")),
  (7L, ("jgonzal","postdoc")),
  (5L, ("franklin", "prof")),
  (2L, ("istoica", "prof"))
))

val relationshipsB = sc.parallelize(Array(
  Edge(3L, 7L, "collab"),
  Edge(5L, 3L, "advisor"),
  Edge(2L, 5L, "colleague"),
  Edge(5L, 7L, "pi")
))

val defaultUser = ("John Doe", "Missing")

val graphB = Graph(usersB, relationshipsB, defaultUser)

// Build the initial Graph A
val usersA = sc.parallelize(Array(
  (3L, ("rxin", "student")),
  (7L, ("jgonzal", "postdoc")),
  (5L, ("franklin", "prof"))
))

val relationshipsA = sc.parallelize(Array(
  Edge(3L, 7L, "collab"),
  Edge(5L, 3L, "advisor")
))

val testGraphA = Graph(usersA, relationshipsA, defaultUser)

//do the mask
val maskResult = testGraphA.mask(graphB)
maskResult.edges.count
maskResult.vertices.count

在我的理解中，掩码函数可以得到所有相同的边和顶点。但是，结果是顶点仅是正确的（maskResult.vertices.count=3），边的计数应该是2，而不是（maskResult.edges.count=0）。

如果您查看，您将看到

mask

使用

EdgeRDD.innerJoin

。如果查看

innerJoin

的，您将看到警告：

Internal将此EdgeRDD与另一个EdgeRDD连接起来，假设两者使用相同的分区策略进行分区。

您需要创建并使用

分区策略

。如果您执行以下操作，它将得到您想要的结果（但可能无法很好地扩展）：

如果你这样做了：

val maskResult = testGraphA.partitionBy(MyPartStrat).mask(graphB.partitionBy(MyPartStrat))

你会得到你想要的结果。但正如我所说，您可能需要找到一种更好的分区策略，而不是将所有内容都塞进一个分区。

答案不错。我要补充的是，他可以选择一种可以找到的预打包分区策略。所以，也许他不需要实际创建一个，他可以使用像

testGraphA.partitionBy（PartitionStrategy.CanonicalRandomVertexCut）

Nice，稍后会添加到我的答案中

val maskResult = testGraphA.partitionBy(MyPartStrat).mask(graphB.partitionBy(MyPartStrat))