Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Graphframes:py4j.protocol.Py4JJavaError:调用o100.createGraph时出错_Apache Spark_Pyspark_Amazon Emr_Graphframes - Fatal编程技术网

Apache spark Graphframes:py4j.protocol.Py4JJavaError:调用o100.createGraph时出错

Apache spark Graphframes:py4j.protocol.Py4JJavaError:调用o100.createGraph时出错,apache-spark,pyspark,amazon-emr,graphframes,Apache Spark,Pyspark,Amazon Emr,Graphframes,我使用Spark 2.4.4运行一个简单的EMR群集,我想使用graphframes v0.7运行以下代码: from pyspark import * from pyspark.sql import * from graphframes import * sc= SparkContext().getOrCreate() sc.setLogLevel("ERROR") spark = SparkSession.builder.appName('graphFrames').getOrCreat

我使用Spark 2.4.4运行一个简单的EMR群集,我想使用graphframes v0.7运行以下代码:

from pyspark import *
from pyspark.sql import *
from graphframes import *


sc= SparkContext().getOrCreate()
sc.setLogLevel("ERROR")
spark = SparkSession.builder.appName('graphFrames').getOrCreate()
spark.sparkContext.addPyFile("/home/hadoop/jars/graphframes.zip")

vertices = spark.createDataFrame([('1', 'Carter', 'Derrick', 50),
                                  ('2', 'May', 'Derrick', 26),
                                 ('3', 'Mills', 'Jeff', 80),
                                  ('4', 'Hood', 'Robert', 65),
                                  ('5', 'Banks', 'Mike', 93),
                                 ('98', 'Berg', 'Tim', 28),
                                 ('99', 'Page', 'Allan', 16)],
                                 ['id', 'name', 'firstname', 'age'])
edges = spark.createDataFrame([('1', '2', 'friend'),
                               ('2', '1', 'friend'),
                              ('3', '1', 'friend'),
                              ('1', '3', 'friend'),
                               ('2', '3', 'follows'),
                               ('3', '4', 'friend'),
                               ('4', '3', 'friend'),
                               ('5', '3', 'friend'),
                               ('3', '5', 'friend'),
                               ('4', '5', 'follows'),
                              ('98', '99', 'friend'),
                              ('99', '98', 'friend')],
                              ['src', 'dst', 'type'])
g = GraphFrame(vertices, edges)
## Take a look at the DataFrames
g.vertices.show()
g.edges.show()
## Check the number of edges of each vertex
g.degrees.show()
正在按以下方式查找和导入它:

[root@ip-172-31-13-149 scripts]# $SPARK_HOME/bin/spark-submit --packages
graphframes:graphframes:0.7.0-spark2.4-s_2.11 tst.py
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-835b0432-a6e7-4b5c-afd6-44e7f6ab2c26;1.0
        confs: [default]
        found graphframes#graphframes;0.7.0-spark2.4-s_2.11 in spark-packages
        found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 116ms :: artifacts dl 3ms
        :: modules in use:
        graphframes#graphframes;0.7.0-spark2.4-s_2.11 from spark-packages in [default]
        org.slf4j#slf4j-api;1.7.16 from central in [default]
当我运行一个简单的graphframe示例时,遇到以下错误:

Traceback (most recent call last):
  File "/home/hadoop/scripts/tst.py", line 32, in <module>
    g = GraphFrame(vertices, edges)
  File "/root/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar/graphframes/graphframe.py", line 89, in __init__
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o100.createGraph.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
        at org.graphframes.GraphFrame$.apply(GraphFrame.scala:676)
        at org.graphframes.GraphFramePythonAPI.createGraph(GraphFramePythonAPI.scala:10)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
还尝试了hughcristensen建议的步骤,如下所示:

我真的很感激任何帮助,因为我不知道我还能做什么

spark.jars.packages              graphframes:graphframes:0.7.0-spark2.4-s_2.11