Apache spark 为什么我需要spark.executor.extraClassPath?

Apache spark 为什么我需要spark.executor.extraClassPath?,apache-spark,Apache Spark,对于spark.executor.extraClassPath,声明如下: 额外的类路径条目,用于前置到执行器的类路径。这主要是为了向后兼容旧版本的Spark。用户通常不需要设置此选项 然而,当我运行Spark Shell时,似乎需要这个。我使用--jars启动Spark Shell,这样可以从Shell中获得类 火花壳\ --jars“/root/assembly.jar,/root/pipeline deps.jar”\ --master=XXX:7077\ --驱动器存储器=5G\ --

对于
spark.executor.extraClassPath
,声明如下:

额外的类路径条目,用于前置到执行器的类路径。这主要是为了向后兼容旧版本的Spark。用户通常不需要设置此选项

然而,当我运行Spark Shell时,似乎需要这个。我使用
--jars
启动Spark Shell,这样可以从Shell中获得类


火花壳\
--jars“/root/assembly.jar,/root/pipeline deps.jar”\
--master=XXX:7077\
--驱动器存储器=5G\
--属性文件shell-config.conf

然而,当我运行我的作业时,我得到一个异常(尽管它们在shell上可用),Kryo找不到我尝试使用的一些罐子。我更希望Spark Shell将我指定的JAR发送给workers,而不需要指定
Spark.executor.extraClassPath
。你知道为什么我觉得这个设置是必要的吗?无论我是否设置了自定义注册器,我都有Kryo例外

spark.serializer   org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator   org.MyCustomKryoRegistrator
下面是我在Spark Shell中运行某些特定于应用程序的代码时遇到的异常

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 10 times, most recent failure: Lost task 0.9 in stage 0.0 (TID 9, michaels-pipeline-worker-0.dev.ai2): com.esotericsoftware.kryo.KryoException: Unable to find class: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
    at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
    at com.twitter.chill.SerDeState.readClassAndObject(SerDeState.java:61)
    at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:94)
    at com.twitter.chill.Externalizer.fromBytes(Externalizer.scala:145)
    at com.twitter.chill.Externalizer.maybeReadJavaKryo(Externalizer.scala:158)
    at com.twitter.chill.Externalizer.readExternal(Externalizer.scala:148)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1842)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1799)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
    ... 39 more

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
  at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1298)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
  at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
  ... 52 elided
Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
  at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
  at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
  at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
  at com.twitter.chill.SerDeState.readClassAndObject(SerDeState.java:61)
  at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:94)
  at com.twitter.chill.Externalizer.fromBytes(Externalizer.scala:145)
  at com.twitter.chill.Externalizer.maybeReadJavaKryo(Externalizer.scala:158)
  at com.twitter.chill.Externalizer.readExternal(Externalizer.scala:148)
  at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1842)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1799)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
  at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
  at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.allenai.s2.spark.package$$anonfun$readRddJsonFunc$1
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
  ... 39 more

尝试删除双引号。不要使用
--jars”/root/assembly.jar、/root/pipeline deps.jar“
而使用
--jars/root/assembly.jar、/root/pipeline deps.jar
@Sumit,这没有什么区别。Bash将在参数传递到
sparkshell
之前对其进行解释。