Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Apache Spark图形框架在BFS上非常慢_Scala_Apache Spark_Graph_Breadth First Search_Graphframes - Fatal编程技术网

Scala Apache Spark图形框架在BFS上非常慢

Scala Apache Spark图形框架在BFS上非常慢,scala,apache-spark,graph,breadth-first-search,graphframes,Scala,Apache Spark,Graph,Breadth First Search,Graphframes,我在下面的代码中使用了ApacheSpark GraphFrames和Scala,我在上面的代码中应用了BFS,并试图找到Vertice 0到100之间的距离 import org.apache.spark._ import org.graphframes._ import org.graphframes.GraphFrame import org.apache.spark.sql.DataFrame import org.apache.spark.sql.SQLContext object S

我在下面的代码中使用了ApacheSpark GraphFrames和Scala,我在上面的代码中应用了BFS,并试图找到Vertice 0到100之间的距离

import org.apache.spark._
import org.graphframes._
import org.graphframes.GraphFrame
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.SQLContext
object SimpApp{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("SimpApp")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val nodesList = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("CSV File Path")
val edgesList= sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("CSV File Path")
val v=nodesList.toDF("id")
val e=edgesList.toDF("src", "dst", "dist")
val g = GraphFrame(v, e)
var paths: DataFrame = g.bfs.fromExpr("id = 0").toExpr(s"id = 100").maxPathLength(101).run()  
paths.show()
sc.stop()
}
}
Soucre节点:0目标节点:100

顶点列表如下所示

id
0
1
2
3
.
.
.
up to
1000
这是边缘列表

src dst dist
0    1   2
1,   2,   1
2,   3,   5 
3,   4,   1
4,   5,   3
5,   6,   3
6,   7,   6
.    .   .
.    .   .
.    .   .
up to
999, 998, 4
但上面给出的代码的问题是,仅执行0到100个vertice需要花费大量时间,因为它运行了4个小时,但没有输出。 上面的代码我在一台有12GB内存的机器上运行


能否请您指导我加速并优化代码。

为了验证,我认为您正在尝试为图形的未加权边寻找最短距离,因此使用BFS。在这种情况下,您可能需要从查询中删除
maxPathLength(101)
,以便:

g.bfs.fromExpr("id = 0").toExpr("id = 100").run() 
如报告所述:

maxPathLength
是路径长度的限制,默认值为 10如果没有有效的长度路径
tripGraph.bfs.fromExpr("id = 'SFO'").toExpr("id = 'BUF').maxPathLength(1).run
tripGraph.bfs.fromExpr("id = 'SFO'").toExpr("id = 'BUF').maxPathLength(2).run