Cassandra 无主机可用异常-火花卡桑德拉接头
我在2.3.0版本中使用spark-cassandra-connector_2.11。 运行最新的Spark 2.3.0 正在尝试从Cassandra(3.0.11.1485)DSE(5.0.5)读取数据 示例阅读:无问题工作:Cassandra 无主机可用异常-火花卡桑德拉接头,cassandra,spark-cassandra-connector,Cassandra,Spark Cassandra Connector,我在2.3.0版本中使用spark-cassandra-connector_2.11。 运行最新的Spark 2.3.0 正在尝试从Cassandra(3.0.11.1485)DSE(5.0.5)读取数据 示例阅读:无问题工作: JavaRDD<Customer> result = javaFunctions(sc).cassandraTable(MyKeyspaceName, "customers", mapRowTo(Customer.class)); 检查您的Cassand
JavaRDD<Customer> result = javaFunctions(sc).cassandraTable(MyKeyspaceName, "customers", mapRowTo(Customer.class));
检查您的Cassandra群集是否启用了SSL。在这种情况下,如果您没有配置正确的证书,我会看到相同的错误。看起来这样解决了它:
.set("spark.cassandra.connection.keep_alive_ms", "1200000")
CassandraConnector cassandraConnector = CassandraConnector.apply(sc.getConf());
SomeSparkRDD.mapPartitions((FlatMapFunction<Iterator<Customer>, CustomerEx>) customerIterator ->
cassandraConnector.withSessionDo(new AbstractFunction1<Session, Iterator<CustomerEx>>() {
@Override
public Iterator<CustomerEx> apply(Session session) {
return asStream(customerIterator, false)
.map(customer -> fetchDataViaSession(customer, session))
.filter(x -> x != null)
.iterator();
}
}));
public static <T> Stream<T> asStream(Iterator<T> sourceIterator, boolean parallel) {
Iterable<T> iterable = () -> sourceIterator;
return StreamSupport.stream(iterable.spliterator(), parallel);
}
spark.cassandra.connection.connections_per_executor_max
spark.cassandra.connection.keep_alive_ms
spark.cassandra.input.fetch.size_in_rows
spark.cassandra.input.split.size_in_mb
Also Tried to reduce the number of Partitions of the RDD which I do mapPartitions+withSessionDo on.
.set("spark.cassandra.connection.keep_alive_ms", "1200000")