Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Ooyala Spark JobServer上的纱线支持_Apache Spark_Ooyala_Spark Jobserver - Fatal编程技术网

Apache spark Ooyala Spark JobServer上的纱线支持

Apache spark Ooyala Spark JobServer上的纱线支持,apache-spark,ooyala,spark-jobserver,Apache Spark,Ooyala,Spark Jobserver,刚开始试验JobServer,并希望在我们的生产环境中使用它 我们通常在Thread客户端模式下单独运行spark作业,并希望转向Ooyala spark作业服务器提供的范例 我能够运行官方页面中显示的WordCount示例。 我尝试运行向spark JobServer提交自定义spark作业,但出现以下错误: { "status": "ERROR", "result": { "message": "null", "errorClass": "scala.MatchError",

刚开始试验JobServer,并希望在我们的生产环境中使用它

我们通常在Thread客户端模式下单独运行spark作业,并希望转向Ooyala spark作业服务器提供的范例

我能够运行官方页面中显示的WordCount示例。 我尝试运行向spark JobServer提交自定义spark作业,但出现以下错误:

{
 "status": "ERROR",
 "result": {
   "message": "null",
  "errorClass": "scala.MatchError",
  "stack": ["spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:220)",
   "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)",
    "scala.concurrent.impl.Future   $PromiseCompletingRunnable.run(Future.scala:24)", 
    "akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)", 
    "akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)",
    "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)",
        "scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java 1339)",
    "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", 
    "scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"]
}
我做了必要的代码修改,比如扩展SparkJob和实现runJob()方法

这是我使用的dev.conf文件:

# Spark Cluster / Job Server configuration
spark {
  # spark.master will be passed to each job's JobContext
     master = "yarn-client"

  # Default # of CPUs for jobs to use for Spark standalone cluster
    job-number-cpus = 4

    jobserver {
      port = 8090
      jar-store-rootdir = /tmp/jobserver/jars
      jobdao = spark.jobserver.io.JobFileDAO
      filedao {
        rootdir = /tmp/spark-job-server/filedao/data
      }

     context-creation-timeout = "60 s"
    }

  contexts {
    my-low-latency-context {
    num-cpu-cores = 1                 
    memory-per-node = 512m        
   }
  }

  context-settings {
    num-cpu-cores = 2         
    memory-per-node = 512m        
  }

  home = "/data/softwares/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041"
}

spray.can.server {
    parsing.max-content-length = 200m
}

spark.driver.allowMultipleContexts = true
YARN_CONF_DIR=/home/spark/conf/
另外,如何为spark作业提供运行时参数,例如--files,-jars? 例如,我通常按如下方式运行自定义spark作业:

./spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/bin/spark-submit --class com.demo.SparkDriver --master yarn-cluster --num-executors 3 --jars /tmp/api/myUtil.jar --files /tmp/myConfFile.conf,/tmp/mySchema.txt /tmp/mySparkJob.jar 

执行器和额外jar的数量通过配置文件以不同的方式传递(请参见dependent jar uris config setting)

应在环境中而不是在.CONF文件中设置Thread_CONF_DIR


至于其他问题,谷歌集团是提出这一问题的合适场所。您可能希望搜索它以查找客户问题,因为其他一些人已经找到了如何让它工作。

非常感谢您的回答。运行时传递文件(如配置文件、架构文件)是否有任何特定于spark job server的配置设置?我知道依赖jar URI属性用于传递额外的jarfiles@JamesIsaac现在没有,但这是一个有趣的建议。要提交问题吗?