Java 如何提交Spark scala作业,Hadoop
我是Spark的新手,我正试图在一个伪分布式Hadoop系统上运行scala作业 Hadoop 2.6+纱线+Spark 1.6.1+scala 2.10.6+JVM 8,一切都是从头开始安装的 我的Scala应用程序是一个简单的WordCount示例,我不知道是什么错误Java 如何提交Spark scala作业,Hadoop,java,scala,hadoop,apache-spark,hadoop2,Java,Scala,Hadoop,Apache Spark,Hadoop2,我是Spark的新手,我正试图在一个伪分布式Hadoop系统上运行scala作业 Hadoop 2.6+纱线+Spark 1.6.1+scala 2.10.6+JVM 8,一切都是从头开始安装的 我的Scala应用程序是一个简单的WordCount示例,我不知道是什么错误 /usr/local/sparkapps/WordCount/src/main/scala/com/mydomain/spark/wordcount/WordCount.scala package com.mydomain.
/usr/local/sparkapps/WordCount/src/main/scala/com/mydomain/spark/wordcount/WordCount.scala
package com.mydomain.spark.wordcount
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
object ScalaWordCount {
def main(args: Array[String]) {
val logFile = "/home/hduser/inputfile.txt"
val sparkConf = new SparkConf().setAppName("Spark Word Count")
val sc = new SparkContext(sparkConf)
val file = sc.textFile(logFile)
val counts = file.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("/home/hduser/output")
}
}
sbt文件:
/usr/local/sparkapps/WordCount/WordCount.sbt
name := "ScalaWordCount"
version := "1.0"
scalaVersion := "2.10.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
汇编:
$ cd /usr/local/sparkapps/WordCount/
$ sbt package
提交:
spark-submit --class com.mydomain.spark.wordcount.ScalaWordCount --master yarn-cluster /usr/local/sparkapps/WordCount/target/scala-2.10/scalawordcount_2.10-1.0.jar
输出:
Exception in thread "main" org.apache.spark.SparkException: Application application_1460107053907_0003 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark日志文件:
来自日志:
16/04/08 12:24:41 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/home/hduser/inputfile.txt
如果要读取本地文件,请使用
val logFile = "file:///home/hduser/inputfile.txt"
从日志::
16/04/08 12:24:41 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/home/hduser/inputfile.txt
如果要读取本地文件,请使用
val logFile = "file:///home/hduser/inputfile.txt"
输入路径不存在:hdfs://localhost:9000/home/hduser/inputfile.txt
输入路径不存在:hdfs://localhost:9000/home/hduser/inputfile.txt