Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何解决java.lang.ClassNotFoundException:org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer_Java_Apache Spark_Tinkerpop_Tinkerpop3_Janusgraph - Fatal编程技术网

如何解决java.lang.ClassNotFoundException:org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer

如何解决java.lang.ClassNotFoundException:org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer,java,apache-spark,tinkerpop,tinkerpop3,janusgraph,Java,Apache Spark,Tinkerpop,Tinkerpop3,Janusgraph,我正在使用tinkerpop+Janus Graph+Spark 格雷德尔先生 compile group: 'org.apache.tinkerpop', name: 'spark-gremlin', version: '3.1.0-incubating' 下面是我们的一些关键配置 spark.serializer: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer 在日志中,相应的长条目引用了包含上述类

我正在使用tinkerpop+Janus Graph+Spark

格雷德尔先生

compile group: 'org.apache.tinkerpop', name: 'spark-gremlin', version: '3.1.0-incubating'
下面是我们的一些关键配置

spark.serializer: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
在日志中,相应的长条目引用了包含上述类的jar

{"@timestamp":"2020-02-18T07:24:21.720+00:00","@version":1,"message":"Added JAR /opt/data/janusgraph/applib2/spark-gremlin-827a65ae26.jar at spark://gdp-identity-stage.target.com:38876/jars/spark-gremlin-827a65ae26.jar with timestamp 1582010661720","logger_name":"o.a.s.SparkContext","thread_name":"SparkGraphComputer-boss","level":"INFO","level_value":20000}
但是我的SparkGraphComputer提交的spark作业失败了,当我们看到执行器日志时,我们看到

Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
为什么即使加载了相应的jar,也会出现此异常

任何人,请对此提出建议

正如我在spark executor中提到的,当我打开下面的一个worker日志时,看到了这个异常complete exception

Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/bin/java" "-cp" "/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/" "-Xmx56320M" "-Dspark.driver.port=43137" "-XX:+UseG1GC" "-XX:+PrintGCDetails" "-XX:+PrintGCTimeStamps" "-Xloggc:/opt/spark/gc.log" "-Dtinkerpop.gremlin.io.kryoShimService=org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopPoolShimService" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@gdp-identity-stage.target.com:43137" "--executor-id" "43392" "--hostname" "192.168.192.10" "--cores" "6" "--app-id" "app-20200220094335-0001" "--worker-url" "spark://Worker@192.168.192.10:36845"
========================================

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:259)
    at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
    at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:200)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:221)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    ... 4 more
当我设置
火花时。jars
属性,我也传递这个jar位置


我们从应用程序创建的Jar是fat Jar类型,这意味着它包含实际代码和所有必需的依赖项,请参见下面的屏幕截图。

如果您查看日志,您会看到

java”“-cp”“/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/”

除非每个spark worker上的
/opt/spark/spark-2.4.0/JARs/*
文件夹中都有gremlin JARs,否则您使用的类不存在


为您的特定应用程序包含它的推荐方式是Gradle Shadow插件,而不是
--packages
spark.jars

什么jar?基于加载的内容?您使用的是Maven还是Gradle?@Boris,我们使用的是gradle@cricket_007,类,org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer,存在于jar中,spark-gremlin-827a65ae26.jar,根据加载到spark clusterLogs的哪个进程的日志?spark可能分布在多台机器上感谢您的响应,您能提供更多关于Gradle Shadow插件的详细信息以及如何在我的案例中使用此插件吗?我们根据代码创建的jar类型为fat j仅ar,它包含代码和所有依赖项jar。有人怀疑为什么我们设置的spark.jars不起作用?1)我不知道您的完整spark提交命令2)您不应该需要带fat/uber jar的spark.jars(由shadow插件创建)3)我在显示的java命令中没有看到对您的jar的引用,因此我根本不清楚您的代码将如何加载