为什么cassandra使用;允许过滤“;在我的代码中没有提及的情况下执行?
我在java8中使用spark-sql-2.4.1、spark-cassandra-connector_2.11-2.4.1 我正在进行如下的简单查询,以获得C*表的行数为什么cassandra使用;允许过滤“;在我的代码中没有提及的情况下执行?,cassandra,apache-spark-sql,datastax-java-driver,Cassandra,Apache Spark Sql,Datastax Java Driver,我在java8中使用spark-sql-2.4.1、spark-cassandra-connector_2.11-2.4.1 我正在进行如下的简单查询,以获得C*表的行数 JavaSparkContext sc = new JavaSparkContext(spark.sparkContext()); long recCount = javaFunctions(sc).cassandraTable(keyspace, columnFamilyName).cassandraCount(); 但它
JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
long recCount = javaFunctions(sc).cassandraTable(keyspace, columnFamilyName).cassandraCount();
但它正在超时,并出现以下错误
1) 为什么在解释之前附加“允许过滤”
实际执行它
2) 甚至认为我设置了“cassandra.output.consistency.level=ANY”为什么会这样
使用“一致性本地_ONE”执行
如何解决这些问题
另一种方法是在Spark端计算count,并将.count()而不是.cassandraCount()。根据我的经验,我建议在生产过程中避免卡桑德拉方面的任何聚合。特别是,当您使用Spark—专为此类任务设计的框架时。“您的意思是我需要使用Spark.read().format(“org.apache.Spark.sql.cassandra”).option(“table”,columnFamilyName)。option(“keyspace”,keyspace)。load().count()?”实际上,是的。“您的意思是我需要使用Spark.read().format(”实际上,是的。
java.io.IOException: Exception during execution of SELECT count(*) FROM "radata"."model_vals" WHERE token("model_id", "type", "value", "code") > ? AND token("model_id", "type", "value", "code") <= ? ALLOW FILTERING: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:350)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
cassandra.output.consistency.level=ANY
cassandra.concurrent.writes=1500
cassandra.output.batch.size.bytes=2056
cassandra.output.batch.grouping.key=partition
cassandra.output.batch.grouping.buffer.size=3000
cassandra.output.throughput_mb_per_sec=128
cassandra.connection.keep_alive_ms=30000
cassandra.read.timeout_ms=600000