Apache spark Cassandra批次无效查询异常-批次太大
我正在使用批处理将数据插入Cassandra。我在运行作业时遇到以下异常Apache spark Cassandra批次无效查询异常-批次太大,apache-spark,cassandra,spark-cassandra-connector,cassandra-3.0,Apache Spark,Cassandra,Spark Cassandra Connector,Cassandra 3.0,我正在使用批处理将数据插入Cassandra。我在运行作业时遇到以下异常 caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.Responses$Error.asException(Responses.java:136) at com.datastax.driver.core.DefaultResult
caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:136)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
我读过很多关于这个问题的博客。但这是没有帮助的。我尝试在初始化时将“spark.cassandra.output.batch.size.bytes”设置为spark配置。但这并没有解决我的问题。我也犯了同样的错误。我的批有大约1000条insert语句
请在下面找到我的代码
CassandraConnector connector = CassandraConnector.apply(javaSparkContext.getConf());
pairRDD.mapToPair(earnCalculatorKeyIterableTuple2 -> {
if (condition) {
do something......
}
else {
Session session = connector.openSession();
BatchStatement batch = new BatchStatement(BatchStatement.Type.UNLOGGED); batch.setConsistencyLevel(ConsistencyLevel.valueOf(LOCAL_QUOROM));
PreparedStatement statement = session.prepare('my insert query');
for (condition) {
if (!condition) {
break;
}
Tuple2._2.forEach(s -> {
if (!condition) {
LOG.info(message);
}
else {
BoundStatement boundStatement = statement.bind("bind variables");
batch.add(boundStatement);
}
});
session.execute(batch);
batch.clear();
}
session.close();
}
return Tuple2;
});
return s;
}
感谢您的帮助。您正在手动创建批,但批太大。在每个批处理中添加较少的行。手动执行此操作的方法有很多,但最简单的方法是添加一个计数器,每次添加X语句时,该计数器都会提交一个批
您正在更改的参数仅与由
saveToCassandra
执行的自动批处理相关,您是否实际使用Spark?我这样问是因为您的跟踪似乎没有任何Spark Cassandra连接器级别,更改batch.size.bytes会更改插入中的语句数。是的,我正在使用Spark Cassandra连接器。我试着给batch.size.bytes=auto。仍然没有解决这个问题。你能提供一个代码示例吗?添加了代码。请调查一下。