Spark Java:Java.lang.NoClassDefFoundError

Spark Java:Java.lang.NoClassDefFoundError,java,json,maven,apache-spark,java-8,Java,Json,Maven,Apache Spark,Java 8,我在本地使用Spark standalone,并使用Maven作为构建自动化工具。因此,我为spark和简单的JSON设置了所有必需的依赖项。对于单词计数这样的简单应用程序,我很好地运行了Spark应用程序,但是当我从简单JSON api导入JSONParser时,我得到了类NotFoundException。我曾尝试使用sparkconfig和spark上下文添加jar文件,但仍然没有帮助 下面是我的pom.xml <project xmlns="http://maven.apache.

我在本地使用Spark standalone,并使用Maven作为构建自动化工具。因此,我为spark和简单的JSON设置了所有必需的依赖项。对于单词计数这样的简单应用程序,我很好地运行了Spark应用程序,但是当我从简单JSON api导入JSONParser时,我得到了类NotFoundException。我曾尝试使用sparkconfig和spark上下文添加jar文件,但仍然没有帮助

下面是我的pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org</groupId>
<artifactId>sparketl</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>sparketl</name>
<url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>


</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>
我的Spark配置有

spark.executor.memory   512m
spark.driver.cores      1
spark.driver.memory     512m
spark.driver.extraClassPath   /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar

有人遇到过这个问题吗?如果是这样的话,这个问题的解决方案是什么?

根据
spark.driver.extraClassPath
(和代码库)-提供给spark的库是一个源库(
json-simple-1.1.1-sources.jar
)。该库可能只包含java文件(源文件,而不是编译的java类)


将其更改为
json-simple-1.1.1.jar
(当然带有完整路径)应该会有所帮助。

根据
spark.driver.extraClassPath
(和代码库)-提供给spark的库是一个源库(
json-simple-1.1.1-sources.jar
)。该库可能只包含java文件(源文件,而不是编译的java类)


将其更改为
json-simple-1.1.1.jar
(当然带有完整路径)应该会有所帮助。

根据
spark.driver.extraClassPath
(和代码库)-提供给spark的库是一个源库(
json-simple-1.1.1-sources.jar
)。该库可能只包含java文件(源文件,而不是编译的java类)


将其更改为
json-simple-1.1.1.jar
(当然带有完整路径)应该会有所帮助。

根据
spark.driver.extraClassPath
(和代码库)-提供给spark的库是一个源库(
json-simple-1.1.1-sources.jar
)。该库可能只包含java文件(源文件,而不是编译的java类)

将其更改为
json-simple-1.1.1.jar
(当然是完整路径)应该会有所帮助

    15/07/08 16:09:17 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/07/08 16:09:17 INFO SparkContext: Added JAR /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar at http://172.16.8.157:52255/jars/json-simple-1.1.1-sources.jar with timestamp 1436396957111
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(110248) called with curMem=0, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(10090) called with curMem=110248, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.9 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52257 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 0 from textFile at SparkEtl.java:35
15/07/08 16:09:17 INFO FileInputFormat: Total input paths to process : 1
15/07/08 16:09:17 INFO SparkContext: Starting job: sortByKey at SparkEtl.java:58
15/07/08 16:09:17 INFO DAGScheduler: Got job 0 (sortByKey at SparkEtl.java:58) with 2 output partitions (allowLocal=false)
15/07/08 16:09:17 INFO DAGScheduler: Final stage: ResultStage 0(sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO DAGScheduler: Parents of final stage: List()
15/07/08 16:09:17 INFO DAGScheduler: Missing parents: List()
15/07/08 16:09:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58), which has no missing parents
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(5248) called with curMem=120338, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.1 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(2888) called with curMem=125586, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52257 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/07/08 16:09:17 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/07/08 16:09:18 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.16.8.157:52260/user/Executor#2100827222]) with ID 0
15/07/08 16:09:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.8.157:52263 with 265.1 MB RAM, BlockManagerId(0, 172.16.8.157, 52263)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52263 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52263 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:19 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.16.8.157): java.lang.NoClassDefFoundError: org/json/simple/parser/JSONParser
    at org.sparketl.etljobs.SparkEtl.lambda$main$b9f570ea$1(SparkEtl.java:44)
    at org.sparketl.etljobs.SparkEtl$$Lambda$11/1498038525.call(Unknown Source)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:259)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:257)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

15/07/08 16:09:19 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 172.16.8.157: java.lang.NoClassDefFoundError (org/json/simple/parser/JSONParser) [duplicate 1]
spark.executor.memory   512m
spark.driver.cores      1
spark.driver.memory     512m
spark.driver.extraClassPath   /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar