UDT列丢失导致Cassandra损坏
我们的生产集群Cassandra版本是:[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL规范3.4.4 |本机协议v4] 重新启动Cassandra节点后,Cassandra未启动,并打印以下错误:UDT列丢失导致Cassandra损坏,cassandra,cql,Cassandra,Cql,我们的生产集群Cassandra版本是:[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL规范3.4.4 |本机协议v4] 重新启动Cassandra节点后,Cassandra未启动,并打印以下错误: INFO [main] 2018-08-22 15:30:04,082 CommitLogReader.java:105 - Skipping playback of empty log: CommitLog-6-1534951460541.log DEBUG [main
INFO [main] 2018-08-22 15:30:04,082 CommitLogReader.java:105 - Skipping playback of empty log: CommitLog-6-1534951460541.log
DEBUG [main] 2018-08-22 15:30:04,082 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281330.log (CL version 6, messaging version 11, compression null)
INFO [Service Thread] 2018-08-22 15:30:06,501 GCInspector.java:284 - ParNew GC in 216ms. CMS Old Gen: 10906456 -> 31114600; Par Eden Space: 859045888 -> 0; Par Survivor Space: 29166056 -> 43187600
DEBUG [main] 2018-08-22 15:30:06,673 CommitLogReader.java:264 - Finished reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281330.log
DEBUG [main] 2018-08-22 15:30:06,674 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281331.log (CL version 6, messaging version 11, compression null)
DEBUG [main] 2018-08-22 15:30:08,009 CommitLogReader.java:264 - Finished reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281331.log
DEBUG [main] 2018-08-22 15:30:08,009 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281332.log (CL version 6, messaging version 11, compression null)
ERROR [main] 2018-08-22 15:30:08,610 JVMStabilityInspector.java:102 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation1296995018372874453dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: java.io.IOError: java.io.EOFException: EOF after 45 bytes out of 33554712
at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:324) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) [apache-cassandra-3.11.3.jar:3.11.3]
在移出CommitLogs(这会导致数据丢失)后,Cassandra确实启动了,但是对某些表的查询失败了
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
以及system.log:
WARN [ReadStage-2] 2018-08-26 11:04:34,091 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,10,main]: {}
java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/policy/rule-83f10050a91f11e890846d2c86545d91/mc-52-big-Data.db
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2601) ~[apache-cassandra-3.11.3.jar:3.11.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_171]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.3.jar:3.11.3]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
经过调查,我确信我通过以下步骤成功地重现了该漏洞:以下步骤复制CorruptsTableException,但不复制CommitLogReadHandler$CommitLogReadException
顺便说一句,在Cassandra 3.11.1中,没有使用上述步骤再现错误。在Cassandra 4.0中,禁止删除(删除)未冻结的用户定义类型列。抛出的错误是
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot drop non-frozen column mt of user type my_type"
我在后备箱上测试了这个。不幸的是,这还不适用于早期版本(<4.0)
对udt列使用freezed
应该可以解决这个问题(我在3.11.3中测试过)(但不可能更改列的类型)
创建表格my_ks.my_表格(
id uuid主键,
冷冻山
);
在Cassandra 4.0中,禁止删除(删除)未冻结的用户定义类型列。抛出的错误是
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot drop non-frozen column mt of user type my_type"
我在后备箱上测试了这个。不幸的是,这还不适用于早期版本(<4.0)
对udt列使用freezed
应该可以解决这个问题(我在3.11.3中测试过)(但不可能更改列的类型)
创建表格my_ks.my_表格(
id uuid主键,
冷冻山
);
本期也有公开发行。本期似乎与本期类似。但是,我尝试使用您提供的步骤复制它。我能够在3.9中复制它,但在3.10、3.11.1或3.11.3中没有。这种行为似乎与jira中描述的一致。您能否提供更多信息以便能够复制它?另外,一种解决方法可能是删除以删除有问题的提交日志,重新添加列,然后放回提交日志。@Horia在这5个步骤之后是否重新启动了Cassandra并尝试从表中进行选择?如果你没有,很抱歉不清楚。我将编辑问题是的,我重新启动了它(我也没有运行nodetool flush)。如前所述,我在3.9OK上成功地复制了它,所以我在再次复制时遇到了一些问题。但在重复插入之后,我想我做到了。这有点难看,但似乎有效。问题似乎与此类似-。但是,我尝试使用您提供的步骤复制它。我能够在3.9中复制它,但在3.10、3.11.1或3.11.3中没有。这种行为似乎与jira中描述的一致。您能否提供更多信息以便能够复制它?另外,一种解决方法可能是删除以删除有问题的提交日志,重新添加列,然后放回提交日志。@Horia在这5个步骤之后是否重新启动了Cassandra并尝试从表中进行选择?如果你没有,很抱歉不清楚。我将编辑问题是的,我重新启动了它(我也没有运行nodetool flush)。如前所述,我在3.9OK上成功地复制了它,所以我在再次复制时遇到了一些问题。但在重复插入之后,我想我做到了。这有点难看,但似乎奏效了。有没有办法事先确定卡桑德拉的那些腐败和其他腐败?意思是在重新启动节点之前。@YossiShasha不幸的是,我认为这是不可能的。例如,一个查询在该节点收到后立即写入提交日志。只有当节点重新启动时,才会读取提交日志,Cassandra读取提交日志以重新应用未到达sstables的更改。我明白了,如果我理解正确,这与提交日志损坏有关。那么SSTable腐败呢?SSTable也已损坏,即使在移动损坏的CommitLogs之后,我们也无法查询此表。这些查询在重新启动之前成功执行,可能是因为它们是从MemTables中检索到的。如果我错了,请纠正我。有没有办法事先识别卡桑德拉的那些腐败和其他腐败?意思是在重新启动节点之前。@YossiShasha不幸的是,我认为这是不可能的。例如,一个查询在该节点收到后立即写入提交日志。只有当节点重新启动时,才会读取提交日志,Cassandra读取提交日志以重新应用未到达sstables的更改。我明白了,如果我理解正确,这与提交日志损坏有关。那么SSTable腐败呢?SSTable也已损坏,即使在移动损坏的CommitLogs之后,我们也无法查询此表。这些查询在重新启动之前成功执行,可能是因为它们是从MemTables中检索到的。如果我错了,请纠正我。
CREATE TABLE my_ks.my_table (
id uuid primary key,
mt frozen<my_type>
);