Scala-Spark corenlp-java.lang.NoClassDefFoundError
我想运行spark coreNLP,但运行spark submit时出现java.lang.NoClassDefFoundError错误 下面是来自github示例的scala代码,我将其放入一个对象中,并定义了SparkContext和SQLContext main.scala.commotion.scalaScala-Spark corenlp-java.lang.NoClassDefFoundError,scala,apache-spark,stanford-nlp,Scala,Apache Spark,Stanford Nlp,我想运行spark coreNLP,但运行spark submit时出现java.lang.NoClassDefFoundError错误 下面是来自github示例的scala代码,我将其放入一个对象中,并定义了SparkContext和SQLContext main.scala.commotion.scala package main.scala import org.apache.spark.SparkContext import org.apache.spark.SparkContex
package main.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext
import com.databricks.spark.corenlp.functions._
object SQLContextSingleton {
@transient private var instance: SQLContext = _
def getInstance(sparkContext: SparkContext): SQLContext = {
if (instance == null) {
instance = new SQLContext(sparkContext)
}
instance
}
}
object Sentiment {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Sentiment")
val sc = new SparkContext(conf)
val sqlContext = SQLContextSingleton.getInstance(sc)
import sqlContext.implicits._
val input = Seq((1, "<xml>Stanford University is located in California. It is a great university.</xml>")).toDF("id", "text")
val output = input
.select(cleanxml('text).as('doc))
.select(explode(ssplit('doc)).as('sen))
.select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment))
output.show(truncate = false)
}
}
我运行sbt包
,没有问题,然后运行Spark
spark submit--class“main.scala.touction”--master local[4]target/scala-2.10/mountainalizer_2.10-1.0.jar
程序在引发异常后失败:
Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/simple/Sentence
at main.scala.com.databricks.spark.corenlp.functions$$anonfun$cleanxml$1.apply(functions.scala:55)
at main.scala.com.databricks.spark.corenlp.functions$$anonfun$cleanxml$1.apply(functions.scala:54)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74)
我尝试过的事情:
我使用Eclipse for Scala,并确保按照建议添加来自StanfordCorenlp的所有JAR
我怀疑在将作业提交给Spark时需要在命令行中添加一些内容,有什么想法吗?我的命令行缺少一些内容,这是正确的 spark submit需要添加所有斯坦福corenlp:
spark-submit
--jars $(echo stanford-corenlp/*.jar | tr ' ' ',')
--class "main.scala.Sentiment"
--master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar
./stanford-corenlp/ejml-0.23.jar
./stanford-corenlp/javax.json-api-1.0-sources.jar
./stanford-corenlp/javax.json.jar
./stanford-corenlp/joda-time-2.9-sources.jar
./stanford-corenlp/joda-time.jar
./stanford-corenlp/jollyday-0.4.7-sources.jar
./stanford-corenlp/jollyday.jar
./stanford-corenlp/protobuf.jar
./stanford-corenlp/slf4j-api.jar
./stanford-corenlp/slf4j-simple.jar
./stanford-corenlp/stanford-corenlp-3.6.0-javadoc.jar
./stanford-corenlp/stanford-corenlp-3.6.0-models.jar
./stanford-corenlp/stanford-corenlp-3.6.0-sources.jar
./stanford-corenlp/stanford-corenlp-3.6.0.jar
./stanford-corenlp/xom-1.2.10-src.jar
./stanford-corenlp/xom.jar
spark-submit
--jars $(echo stanford-corenlp/*.jar | tr ' ' ',')
--class "main.scala.Sentiment"
--master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar