Java Spark 2.0.0:在群集模式下从Cassandra读取
我在运行Spark应用程序时遇到一些问题,该应用程序从Spark 2.0.0中的Cassandra读取数据 我的代码工作如下:Java Spark 2.0.0:在群集模式下从Cassandra读取,java,apache-spark,cassandra,spark-cassandra-connector,Java,Apache Spark,Cassandra,Spark Cassandra Connector,我在运行Spark应用程序时遇到一些问题,该应用程序从Spark 2.0.0中的Cassandra读取数据 我的代码工作如下: DataFrameReader readerCassandra = SparkContextUtil.getInstance().read() .format("org.apache.spark.sql.cassandra") .option("spark.cassandra.connec
DataFrameReader readerCassandra = SparkContextUtil.getInstance().read()
.format("org.apache.spark.sql.cassandra")
.option("spark.cassandra.connection.host", [DATABASE_IP])
.option("spark.cassandra.connection.port", [DATABASE_PORT]);
final Map<String,String> map = new HashMap<String,String>();
map.put("table", "MyTable");
map.put("keyspace", "MyKeyspace");
public final StructType schema = DataTypes.createStructType(
new StructField[] { DataTypes.createStructField("id", DataTypes.StringType, true),
DataTypes.createStructField("timestamp", DataTypes.TimestampType, true),
DataTypes.createStructField("value", DataTypes.DoubleType, true)
});
final Dataset<Row> dataset = readerCassandra.schema(schema).options(map).load();
dataset.show(false);
当我这么做的时候,一切都很顺利。但是现在,我想在集群模式下运行我的应用程序
因此,我通过使用主IP设置sparkMaster
,并将deployMode
设置为“集群”,对提交脚本进行了一些修改
当我提交申请时,我的驱动程序日志中几乎立即出现以下错误:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
...
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
注:
- 我仍然有一个与我的主机在同一台机器上只有一个工人的集群的错误
- 起初,我使用的是Spark 2.3.1,在集群模式下运行代码没有问题(在
中使用Spark-cassandra-connector_2.11-2.3.1.jar)--jars
- 我在
中尝试了多个jar,比如:--jar
,spark-cassandra-connector\u 2.11-2.0.jar
,spark-cassandra-connector\u 2.11-2.0.2.jar
,但都不起作用spark-cassandra-connector\u 2.11-2.3.1.jar
- 在
参数中设置了一些其他jar,并将其考虑在内--jars
file:///path/to/jars/spark-cassandra-connector_2.11-2.0.0.jar
相反-在这种情况下,它将通过驱动程序的HTTP服务器分发给执行者。否则,它希望您已经将该文件复制到所有计算机,以避免进程本身进行复制。看
我建议只创建包含所有依赖项(Spark除外)的uberjar,然后提交—这样做会减少痛苦。uberjar解决了我的问题,是一种更好的解决方案。谢谢
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
...
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...