Apache spark Spark作业因java.lang.ArrayIndexOutOfBoundsException失败:1

Apache spark Spark作业因java.lang.ArrayIndexOutOfBoundsException失败:1,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个spark作业,我正在使用spark submit执行。每当我执行jar时,jar都会失败,错误为java.lang.ArrayIndexOutOfBoundsException:1 以下是完整的堆栈跟踪: [hadoop@batch-cluster-master data]$ /usr/lib/spark/bin/spark-submit --master yarn --queue refault --driver-memory 12G --executor-memory 12G --

我有一个spark作业,我正在使用spark submit执行。每当我执行jar时,jar都会失败,错误为java.lang.ArrayIndexOutOfBoundsException:1

以下是完整的堆栈跟踪:

[hadoop@batch-cluster-master data]$ /usr/lib/spark/bin/spark-submit --master yarn --queue refault --driver-memory 12G --executor-memory 12G --executor-cores 3 --driver-cores 2 --class com.orgid.dp.batch.sql.BatchDriver /tmp/dp-batch-sql.jar /home/hadoop/PT_Data/batch-sql-ps-pathFinder-working.json
16/05/18 00:22:56 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:56 INFO spark.SparkContext: Running Spark version 1.6.0
16/05/18 00:22:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/18 00:22:56 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:56 INFO spark.SecurityManager: Changing view acls to: hadoop
16/05/18 00:22:56 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/05/18 00:22:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 37913.
16/05/18 00:22:57 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/05/18 00:22:57 INFO Remoting: Starting remoting
16/05/18 00:22:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.41.66.63:59598]
16/05/18 00:22:57 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 59598.
16/05/18 00:22:57 INFO spark.SparkEnv: Registering MapOutputTracker
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:57 INFO spark.SparkEnv: Registering BlockManagerMaster
16/05/18 00:22:57 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-56307d3d-6591-48bb-8bf8-f4989d71cd58
16/05/18 00:22:57 INFO storage.MemoryStore: MemoryStore started with capacity 8.4 GB
16/05/18 00:22:58 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/05/18 00:22:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/18 00:22:58 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/05/18 00:22:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/05/18 00:22:58 INFO ui.SparkUI: Started SparkUI at http://10.41.66.63:4040
16/05/18 00:22:58 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99/httpd-0f747205-f207-476e-8317-6083d8fe0b37
16/05/18 00:22:58 INFO spark.HttpServer: Starting HTTP Server
16/05/18 00:22:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/18 00:22:58 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44525
16/05/18 00:22:58 INFO util.Utils: Successfully started service 'HTTP file server' on port 44525.
16/05/18 00:22:58 INFO spark.SparkContext: Added JAR file:/tmp/dp-batch-sql.jar at http://10.41.66.63:44525/jars/dp-batch-sql.jar with timestamp 1463511178539
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 WARN spark.SparkConf: The configuration key 'spark.akka.retry.wait' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.rpc.retry.wait' instead.
16/05/18 00:22:58 INFO client.RMProxy: Connecting to ResourceManager at batch-cluster-master/10.41.66.63:8032
16/05/18 00:22:58 INFO yarn.Client: Requesting a new application from cluster with 5 NodeManagers
16/05/18 00:22:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container)
16/05/18 00:22:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/18 00:22:58 INFO yarn.Client: Setting up container launch context for our AM
16/05/18 00:22:58 INFO yarn.Client: Setting up the launch environment for our AM container
16/05/18 00:22:59 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:264)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:262)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:262)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:635)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:633)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:721)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
        at com.orgid.dp.batch.sql.BatchDriver$.main(BatchDriver.scala:56)
        at com.orgid.dp.batch.sql.BatchDriver.main(BatchDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/05/18 00:22:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/05/18 00:22:59 INFO ui.SparkUI: Stopped Spark web UI at http://10.41.66.63:4040
16/05/18 00:22:59 INFO cluster.YarnClientSchedulerBackend: Stopped
16/05/18 00:22:59 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/18 00:22:59 INFO storage.MemoryStore: MemoryStore cleared
16/05/18 00:22:59 INFO storage.BlockManager: BlockManager stopped
16/05/18 00:22:59 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/05/18 00:22:59 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
16/05/18 00:22:59 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/18 00:22:59 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:264)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:262)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:262)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:635)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:633)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:721)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
        at com.orgid.dp.batch.sql.BatchDriver$.main(BatchDriver.scala:56)
        at com.orgid.dp.batch.sql.BatchDriver.main(BatchDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/05/18 00:22:59 INFO util.ShutdownHookManager: Shutdown hook called
16/05/18 00:22:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99
16/05/18 00:22:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-301d676b-38f6-4895-8a04-af37c5b7fa99/httpd-0f747205-f207-476e-8317-6083d8fe0b37
16/05/18 00:22:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
我无法找出问题出在哪里。请帮忙


提前感谢

您似乎受到此错误的影响:


要么将Thread升级到更高版本(2.8+),要么找到没有值的环境变量。

您可能应该添加代码,可能是您的json数据,因为它是数组跳出的。您好,我在问题中添加了json,您确定要将所有参数传递给jar吗?您传递的是
--queue refault
,而不是
--queue default
。也许这就是问题的原因?嗨,Pawel,我特意排了这个队,这个队是正确的。嗨,Daniel,谢谢你的输入。我从另一个实例中复制了环境变量,该实例工作正常,问题似乎已得到解决。
{
  "driver.config": {
          "dp.batch.event.dir": "hdfs://hadoop.admin.com:9000/user/hadoop/parquet_data/output/",
          "dp.batch.udf.scan.packages" : "com.orgid.dp.batch.udfs",
          "dp.batch.enable.pathfinder": "true",
          "dp.batch.input.timezone": "IST",
          "dp.admin.local": "true",
          "dp.admin.host": "",
          "dp.admin.port": "",
          "dp.batch.output.dir": "hdfs://hadoop.admin.com:9000/user/hadoop/aman/",
          "dp.batch.output.timezone": "IST",
          "dp.batch.output.date.dir.format": "yyyy/MM/dd/HH/mm",
          "dp.batch.output.partition.count" : "4",

          "email.enable" : "false",
          "email.sender" : "feedsystemreports@abc.com, FeedSystem Reports",
          "email.recipient" : "abcd@abc.com"
        },

  "dp.batch.read.data" :{
    "last.hour" : "",
    "last.day" : "",
    "specific.date.startTime" :"01:05:16:00:00:00",
    "specific.date.endTime" : "15:05:16:23:59:59"
  },

  "pathFinder.config" : {
          "dp.storage.db.connection.url" : "jdbc:mysql://db.org.com:3306/dis",
          "dp.storage.db.user.name"      : "hadoop",
          "dp.storage.db.password"       : "hadoop"
        },
   "kafkaProducer.config" : {
     "topic" : "dp_batch_api",
     "bootstrap.servers" : "kafka.org.com:9920",
     "replayJobEventTopic" : "dp_batch_replay"
   },

  "expressions": [
    {
      "id":30,
      "expression":"SELECT count(*) from appHeartBeat",
      "dependencies":["appHeartBeat"],
      "alias":"",
      "doExport":true
    }

   ],

"externalDependencies":[
    ],

        "spark.config" : {
          "spark.sql.caseSensitive" : "true",
          "spark.driver.memory" : "16G",
          "spark.executor.memory" : "17G",
          "spark.executor.cores" : "5",
          "spark.executor.instances" : "25",
          "spark.yarn.executor.memoryOverhead" : "2048",
          "spark.app.name" : "dplite-batch-sql",
          "spark.core.connection.ack.wait.timeout" : "600",
          "spark.rdd.compress" : "false",
          "spark.akka.timeout" : "600000",
          "spark.storage.blockManagerHeartBeatMs"  : "200000",
          "spark.storage.blockManagerSlaveTimeoutMs" : "200000",
          "spark.akka.retry.wait" : "120000",
          "conf spark.akka.frameSize" : "1500",
          "spark.driver.maxResultSize" : "1500",
          "spark.worker.timeout" : "360000",
          "spark.driver.extraJavaOptions" : "-XX:MaxPermSize=2048m -XX:PermSize=512m"
        }

}