Cassandra 协调器从一个节点获得响应的时间明显晚于从其他节点获得响应的时间
请帮我理解我错过了什么。 我看到SELECT上一个集群节点的奇怪行为,带有LIMIT和ORDER BY DESC子句: 跟踪(仅部分): …Cassandra 协调器从一个节点获得响应的时间明显晚于从其他节点获得响应的时间,cassandra,cassandra-2.2,Cassandra,Cassandra 2.2,请帮我理解我错过了什么。 我看到SELECT上一个集群节点的奇怪行为,带有LIMIT和ORDER BY DESC子句: 跟踪(仅部分): … 正在向/10.0.25.56[MessagingService Outgoing-/10.0.25.56]发送请求/响应消息| 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862 向/10.0.25.56[MessagingService Outgoing-/10.0.25.56]| 2016-02-29 22:1
正在向/10.0.25.56[MessagingService Outgoing-/10.0.25.56]发送请求/响应消息| 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862
向/10.0.25.56[MessagingService Outgoing-/10.0.25.56]| 2016-02-29 22:17:25.136000 | 10.0.25.57 | 6283发送请求/响应消息
向/10.0.25.56[MessagingService Outgoing-/10.0.25.56]| 2016-02-2922:17:38.568000| 10.0.24.51 | 457931
10.0.25.56-协调器节点
10.0.23.15,10.0.24.51,10.0.25.57-带数据的节点 协调器比其他节点晚13秒从10.0.24.51获得响应为什么?我怎样才能修好它 分区键(uid=0x50236b6de695baa1140004bf)的行数约为300 如果我们使用ORDER BY ASC(我们的集群顺序)或将该分区键的值限制在小于行数的范围内,则一切正常 Cassandra(v2.2.5)集群包含25个节点。 每个节点拥有大约400Gb的数据 集群被放置在AWS中。在VPC中,节点平均分布在3个子网上。节点的实例类型为c3.4XL(16个CPU核,30GB RAM)。我们使用EBS支持的存储(1TB GP SSD) 键空间RF等于3 列族:
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
nodetool gcstats(10.0.25.57):
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
nodetool gcstats(10.0.23.15):
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
nodetool gcstats(10.0.24.51):
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
这可能是由于一些与卡桑德拉相关和不相关的因素造成的 非卡桑德拉特异性
- 这台计算机上的硬件(CPU/RAM/磁盘类型(SSD v)如何旋转 节点与其他节点进行比较
- 网络是如何配置的?到此节点的流量是否比其他节点慢?节点之间是否存在路由问题
- 与其他节点相比,此服务器上的负载如何
- JVM配置是否正确?GC的运行频率是否明显高于其他节点?检查此节点和其他节点上的
,以进行比较nodetool gcstats
- 最近是否在此节点上运行过压缩?请检查
nodetool compactionhistory
- 磁盘上损坏的文件是否存在任何问题
- 您是否检查过system.log以查看它是否包含任何信息
什么是RF?什么是一致性?该节点上还运行什么(修复?压缩?)?RF等于3。一致性级别全部。没有其他内容。我添加了一些关于“非特定”和“特定”问题的信息。请解释,“损坏的文件”中的含义是什么?这个关于sstables或fs的问题通常是什么?我如何检查sstables?基于上述附加信息,cassandra/system.log中没有错误消息我将推测,这个问题可能与子网之间的路由或EBS存储的延迟有关。EBS支持的存储是NAS,因此它预计会有较高的延迟。有关更多信息,请查看此处: