Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Spark 2.0.0:在群集模式下从Cassandra读取_Java_Apache Spark_Cassandra_Spark Cassandra Connector - Fatal编程技术网

Java Spark 2.0.0:在群集模式下从Cassandra读取

Java Spark 2.0.0:在群集模式下从Cassandra读取,java,apache-spark,cassandra,spark-cassandra-connector,Java,Apache Spark,Cassandra,Spark Cassandra Connector,我在运行Spark应用程序时遇到一些问题,该应用程序从Spark 2.0.0中的Cassandra读取数据 我的代码工作如下: DataFrameReader readerCassandra = SparkContextUtil.getInstance().read() .format("org.apache.spark.sql.cassandra") .option("spark.cassandra.connec

我在运行Spark应用程序时遇到一些问题,该应用程序从Spark 2.0.0中的Cassandra读取数据

我的代码工作如下:

DataFrameReader readerCassandra = SparkContextUtil.getInstance().read() 
                    .format("org.apache.spark.sql.cassandra")
                    .option("spark.cassandra.connection.host", [DATABASE_IP])
                    .option("spark.cassandra.connection.port", [DATABASE_PORT]);

final Map<String,String> map = new HashMap<String,String>();

map.put("table", "MyTable");
map.put("keyspace", "MyKeyspace");

public final  StructType schema = DataTypes.createStructType(
        new StructField[] { DataTypes.createStructField("id", DataTypes.StringType, true),
            DataTypes.createStructField("timestamp", DataTypes.TimestampType, true),
            DataTypes.createStructField("value", DataTypes.DoubleType, true)
        });

final Dataset<Row> dataset = readerCassandra.schema(schema).options(map).load(); 
dataset.show(false);
当我这么做的时候,一切都很顺利。但是现在,我想在集群模式下运行我的应用程序

因此,我通过使用主IP设置
sparkMaster
,并将
deployMode
设置为“集群”,对提交脚本进行了一些修改

当我提交申请时,我的驱动程序日志中几乎立即出现以下错误:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
        at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
        ...

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        ...
注:

  • 我仍然有一个与我的主机在同一台机器上只有一个工人的集群的错误
  • 起初,我使用的是Spark 2.3.1,在集群模式下运行代码没有问题(在
    --jars
    中使用Spark-cassandra-connector_2.11-2.3.1.jar)
  • 我在
    --jar
    中尝试了多个jar,比如:
    spark-cassandra-connector\u 2.11-2.0.jar
    spark-cassandra-connector\u 2.11-2.0.2.jar
    spark-cassandra-connector\u 2.11-2.3.1.jar
    ,但都不起作用
  • --jars
    参数中设置了一些其他jar,并将其考虑在内

您可能需要将路径指定为
file:///path/to/jars/spark-cassandra-connector_2.11-2.0.0.jar
相反-在这种情况下,它将通过驱动程序的HTTP服务器分发给执行者。否则,它希望您已经将该文件复制到所有计算机,以避免进程本身进行复制。看


我建议只创建包含所有依赖项(Spark除外)的uberjar,然后提交—这样做会减少痛苦。

uberjar解决了我的问题,是一种更好的解决方案。谢谢
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
        at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
        ...

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        ...