Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark pyspark graph用于查找大型图形的连接组件_Apache Spark_Pyspark_Connected Components_Graphframes - Fatal编程技术网

Apache spark pyspark graph用于查找大型图形的连接组件

Apache spark pyspark graph用于查找大型图形的连接组件,apache-spark,pyspark,connected-components,graphframes,Apache Spark,Pyspark,Connected Components,Graphframes,我试图使用pyspark中graphframes中的connectedComponents()来计算一个相当大的图的连接组件,该图大约有1800K个顶点和500k条边 edgeDF.printSchema() root |-- src: string (nullable = true) |-- dst: string (nullable = true) vertDF.printSchema() root |-- id: string (nullable = true) vertDF.

我试图使用pyspark中graphframes中的
connectedComponents()
来计算一个相当大的图的连接组件,该图大约有1800K个顶点和500k条边

edgeDF.printSchema()
root
 |-- src: string (nullable = true)
 |-- dst: string (nullable = true)


vertDF.printSchema()
root
 |-- id: string (nullable = true)

vertDF.count()
1879806

edgeDF.count()
452196

custGraph = gf.GraphFrame(vertDF,edgeDF)

comp = custGraph.connectedComponents()
即使6小时后,任务也没有结束。我在一台装有windows的机器上运行pyspark

a。在给定的设置中进行这样的计算是否可行

b。我收到了如下警告信息

[rdd_73_2, rdd_90_2]
[Stage 21:=========>        (2 + 2) / 4][Stage 22:>                 (0 + 2) / 4]16/10/13 01:28:42 WARN Executor: 2 block locks were not released by TID = 632:

[rdd_73_0, rdd_90_0]
[Stage 21:=============>    (3 + 1) / 4][Stage 22:>                 (0 + 3) / 4]16/10/13 01:28:43 WARN Executor: 2 block locks were not released by TID = 633:

[rdd_73_1, rdd_90_1]
[Stage 37:>                 (0 + 4) / 4][Stage 38:>                 (0 + 0) / 4]16/10/13 01:28:47 WARN Executor: 3 block locks were not released by TID = 844:

[rdd_90_0, rdd_104_0, rdd_107_0]
这是什么意思


c。如何在graphframe中指定图形是无向的?我们需要在两个方向上添加边吗

连接的组件不会自动将图形视为无向的吗?我认为您不必担心(c)。关于(b),您可能希望在GraphFrames跟踪器上关注这个问题:连接的组件不会自动将图形视为无向的吗?我认为您不必担心(c)。关于(b),您可能希望在GraphFrames tracker上关注此问题: