使用spark cassandra连接器java api并在尝试提交spark作业时出错

使用spark cassandra连接器java api并在尝试提交spark作业时出错,java,cassandra,apache-spark,spark-cassandra-connector,Java,Cassandra,Apache Spark,Spark Cassandra Connector,因此,我试图得到一个使用Java和spark cassandra连接器的简单示例程序。运行sbt程序集工作得很好,我得到了一个提交给spark的胖罐子。问题出现在这里,当我将作业提交给spark时,会出现以下错误: vagrant@cassandra-spark:~$ source submit-job.sh Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/C

因此,我试图得到一个使用Java和spark cassandra连接器的简单示例程序。运行sbt程序集工作得很好,我得到了一个提交给spark的胖罐子。问题出现在这里,当我将作业提交给spark时,会出现以下错误:

vagrant@cassandra-spark:~$ source submit-job.sh
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/CassandraJavaUtil
    at JavaTest.main(JavaTest.java:13)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.japi.CassandraJavaUtil
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 10 more
这是下面的submit-job.sh脚本:

#!/usr/bin/env bash
~/spark/bin/spark-submit --driver-class-path ~/JavaTest/lib/spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar ~/JavaTest/target/scala-2.10/CassSparkTest-assembly-1.0.jar
这是我的build.sbt文件

lazy val root = (project in file(".")).
        settings(
                name := "CassSparkTest",
                version := "1.0"
        )
libraryDependencies ++= Seq(
        "com.datastax.cassandra" % "cassandra-driver-core" % "2.1.5" % "provided",
        "org.apache.cassandra" % "cassandra-thrift" % "2.1.5" % "provided",
        "org.apache.cassandra" % "cassandra-clientutil" % "2.1.5" % "provided",
        //"com.datastax.spark" %% "spark-cassandra-connector" % "1.3.0-M1"  % "provided",
        "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.3.0-M1" % "provided",
        "org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
        "org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided",
        "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided",
        "org.apache.commons" % "commons-lang3" % "3.4" % "provided"
)
下面是正在编译的代码:

import static com.datastax.spark.connector.japi.CassandraJavaUtil.*;
import com.datastax.spark.connector.japi.CassandraRow;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.commons.lang3.StringUtils;

public class JavaTest {
        public static void main(String[] args) {
                SparkConf conf = new SparkConf().set("spark.cassandra.connection.host", "127.0.0.1");
                JavaSparkContext sc = new JavaSparkContext("spark://192.168.10.11:7077", "test", conf);
                JavaRDD<String> cassandraRowsRDD = javaFunctions(sc).cassandraTable("ks", "test")
                        .map(new Function<CassandraRow, String>() {
                                @Override
                                public String call(CassandraRow cassandraRow) throws Exception {
                                        return cassandraRow.toString();
                                }
                        });
                System.out.println("Data as CassandraRows: \n" + StringUtils.join(cassandraRowsRDD.toArray(), "\n"));

        }
}
import static com.datastax.spark.connector.japi.CassandraJavaUtil.*;
导入com.datastax.spark.connector.japi.CassandraRow;
导入org.apache.spark.api.java.JavaSparkContext;
导入org.apache.spark.api.java.JavaRDD;
导入org.apache.spark.SparkConf;
导入org.apache.spark.api.java.function.function;
导入org.apache.commons.lang3.StringUtils;
公共类JavaTest{
公共静态void main(字符串[]args){
SparkConf conf=newsparkconf().set(“spark.cassandra.connection.host”,“127.0.0.1”);
JavaSparkContext sc=新的JavaSparkContext(“spark://192.168.10.11:7077“,”测试“,形态);
JavaRDD cassandraRowsRDD=javaFunctions(sc).cassandraTable(“ks”,“test”)
.map(新函数(){
@凌驾
公共字符串调用(CassandraRow CassandraRow)引发异常{
返回cassandraRow.toString();
}
});
System.out.println(“数据形式为CassandraRows:\n”+StringUtils.join(cassandraRowsRDD.toArray(),“\n”);
}
}

sbt程序集工作正常,但一旦实际提交作业,就找不到类定义

当我之前遇到这个问题时,我不得不将当前的类jar添加到SparkConf中。例如:

SparkConf conf = new SparkConf().set("spark.cassandra.connection.host", "127.0.0.1");

conf.setJars(SparkContext.jarOfObject(this))

当我之前点击这个按钮时,我不得不将当前的类jar添加到SparkConf中。例如:

SparkConf conf = new SparkConf().set("spark.cassandra.connection.host", "127.0.0.1");

conf.setJars(SparkContext.jarOfObject(this))

我添加了conf.setJars(JavaSparkContext.jarOfClass(JavaTest.class));我仍然得到了相同的错误,我添加了conf.setJars(JavaSparkContext.jarOfClass(JavaTest.class));我仍然得到了相同的错误,我添加了conf.setJars(JavaSparkContext.jarOfClass(JavaTest.class));我仍然有同样的错误检查你的spark cassandra连接器版本检查你的spark cassandra连接器版本检查你的spark cassandra连接器版本