Scala Spark独立模式:以编程方式提交作业

Scala Spark独立模式:以编程方式提交作业,scala,apache-spark,Scala,Apache Spark,我是Spark的新手,我正在尝试从我的应用程序提交“快速启动”作业。我试图通过在本地主机上启动master和slave来模拟独立模式 object SimpleApp { def main(args: Array[String]): Unit = { val logFile = "/opt/spark-2.0.0-bin-hadoop2.7/README.md" val conf = new SparkConf().setAppName("SimpleApp")

我是Spark的新手,我正在尝试从我的应用程序提交“快速启动”作业。我试图通过在本地主机上启动master和slave来模拟独立模式

object SimpleApp {

  def main(args: Array[String]): Unit = {

    val logFile = "/opt/spark-2.0.0-bin-hadoop2.7/README.md"
    val conf = new SparkConf().setAppName("SimpleApp")
    conf.setMaster("spark://10.49.30.77:7077")
    val sc = new SparkContext(conf)

    val logData = sc.textFile(logFile,2).cache();
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s , lines with b: %s".format(numAs,numBs))

  }

}
我在IDE(IntelliJ)中运行Spark应用程序

查看日志(workernode中的日志),spark似乎找不到作业类

16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1912.0 B, free 366.3 MB)
16/09/15 17:50:58 INFO TorrentBroadcast: Reading broadcast variable 1 took 137 ms
16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 366.3 MB)
16/09/15 17:50:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
1.这是否意味着作业资源(类)不会传输到从属节点

2.对于单机模式,我必须使用“spark submit”CLI提交作业?如果是,如何从应用程序(例如Web应用程序)提交sparks作业

3.另一个无关的问题:我在日志中看到,DriverProgram启动了一台服务器(端口4040)。这是为了什么?DriveProgram作为客户端,为什么要启动此服务

16/09/15 17:50:52 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/15 17:50:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/15 17:50:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.49.30.77:4040

您应该使用
setJars
方法在
SparkConf
中设置资源路径,或者在从CLI运行时在
spark submit
命令中使用--jars选项提供资源