Cassandra集群中的慢速读取
我最近启动了一个带有3台机器的Cassandra集群。我让它工作得很好,但在我不得不重置其中一个节点(在本文底部解释)之后,我在从一个最大的表中读取数据时遇到了问题(请参见下面的跟踪) 我认为我有一个非常明显的分区和集群键设置,在崩溃之前我没有这个问题,所以我不认为这是问题所在Cassandra集群中的慢速读取,cassandra,Cassandra,我最近启动了一个带有3台机器的Cassandra集群。我让它工作得很好,但在我不得不重置其中一个节点(在本文底部解释)之后,我在从一个最大的表中读取数据时遇到了问题(请参见下面的跟踪) 我认为我有一个非常明显的分区和集群键设置,在崩溃之前我没有这个问题,所以我不认为这是问题所在 CREATE TABLE datachannel_6min ( channel_id int, time_start timestamp, power_avg float, power_min float
CREATE TABLE datachannel_6min (
channel_id int,
time_start timestamp,
power_avg float,
power_min float,
power_max float,
energy float,
temperature_in float,
PRIMARY KEY (channel_id, time_start)
);
查询是使用复合键的单行选择
select * from datachannel_6min where channel_id = 1028 order by time_start desc limit 1;
这里有4个痕迹的例子。。。正如你所看到的,它并不总是完全相同的
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------+--------------+----------+----------------
execute_cql3_query | 09:00:11,930 | 10.1.1.5 | 0
Parsing select * from datachannel_6min where channel_id = 1042 order by time_start desc limit 1; | 09:00:11,930 | 10.1.1.5 | 102
Preparing statement | 09:00:11,930 | 10.1.1.5 | 233
Executing single-partition query on datachannel_6min | 09:00:11,931 | 10.1.1.5 | 1135
Acquiring sstable references | 09:00:11,931 | 10.1.1.5 | 1163
Merging memtable tombstones | 09:00:11,931 | 10.1.1.5 | 1185
Key cache hit for sstable 14912 | 09:00:11,931 | 10.1.1.5 | 1223
Seeking to partition indexed section in data file | 09:00:11,931 | 10.1.1.5 | 1230
Key cache hit for sstable 14823 | 09:00:11,984 | 10.1.1.5 | 53805
Seeking to partition indexed section in data file | 09:00:11,984 | 10.1.1.5 | 53851
Key cache hit for sstable 14786 | 09:00:12,059 | 10.1.1.5 | 129027
Seeking to partition indexed section in data file | 09:00:12,059 | 10.1.1.5 | 129060
Key cache hit for sstable 14749 | 09:00:12,241 | 10.1.1.5 | 311521
Seeking to partition indexed section in data file | 09:00:12,241 | 10.1.1.5 | 311558
Key cache hit for sstable 14714 | 09:00:12,242 | 10.1.1.5 | 311843
Seeking to partition indexed section in data file | 09:00:12,242 | 10.1.1.5 | 311849
Partition index with 0 entries found for sstable 14913 | 09:00:12,242 | 10.1.1.5 | 312153
Seeking to partition indexed section in data file | 09:00:12,242 | 10.1.1.5 | 312159
Partition index with 0 entries found for sstable 14914 | 09:00:12,354 | 10.1.1.5 | 423820
Seeking to partition indexed section in data file | 09:00:12,354 | 10.1.1.5 | 423849
Partition index with 0 entries found for sstable 14916 | 09:00:12,354 | 10.1.1.5 | 424455
Seeking to partition indexed section in data file | 09:00:12,354 | 10.1.1.5 | 424463
Partition index with 0 entries found for sstable 14915 | 09:00:12,420 | 10.1.1.5 | 490468
Seeking to partition indexed section in data file | 09:00:12,420 | 10.1.1.5 | 490501
Partition index with 0 entries found for sstable 14917 | 09:00:12,492 | 10.1.1.5 | 561711
Seeking to partition indexed section in data file | 09:00:12,492 | 10.1.1.5 | 561748
Partition index with 146 entries found for sstable 14918 | 09:00:12,696 | 10.1.1.5 | 766248
Seeking to partition indexed section in data file | 09:00:12,696 | 10.1.1.5 | 766306
Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones | 09:00:12,696 | 10.1.1.5 | 766323
Merging data from memtables and 11 sstables | 09:00:12,696 | 10.1.1.5 | 766329
Read 2 live and 0 tombstoned cells | 09:00:12,773 | 10.1.1.5 | 842632
Request complete | 09:00:12,773 | 10.1.1.5 | 843350
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------+--------------+----------+----------------
execute_cql3_query | 09:05:46,255 | 10.1.1.4 | 0
Message received from /10.1.1.4 | 09:05:46,250 | 10.1.1.5 | 21
Executing single-partition query on datachannel_6min | 09:05:46,250 | 10.1.1.5 | 520
Acquiring sstable references | 09:05:46,250 | 10.1.1.5 | 593
Merging memtable tombstones | 09:05:46,250 | 10.1.1.5 | 609
Bloom filter allows skipping sstable 14912 | 09:05:46,250 | 10.1.1.5 | 630
Bloom filter allows skipping sstable 14823 | 09:05:46,250 | 10.1.1.5 | 641
Bloom filter allows skipping sstable 14786 | 09:05:46,250 | 10.1.1.5 | 647
Bloom filter allows skipping sstable 14749 | 09:05:46,250 | 10.1.1.5 | 654
Bloom filter allows skipping sstable 14714 | 09:05:46,251 | 10.1.1.5 | 757
Bloom filter allows skipping sstable 14913 | 09:05:46,251 | 10.1.1.5 | 763
Bloom filter allows skipping sstable 14914 | 09:05:46,251 | 10.1.1.5 | 770
Bloom filter allows skipping sstable 14916 | 09:05:46,251 | 10.1.1.5 | 776
Bloom filter allows skipping sstable 14915 | 09:05:46,251 | 10.1.1.5 | 783
Bloom filter allows skipping sstable 14917 | 09:05:46,251 | 10.1.1.5 | 789
Parsing select * from datachannel_6min where channel_id = 1036 order by time_start desc limit 1; | 09:05:46,255 | 10.1.1.4 | 103
Preparing statement | 09:05:46,255 | 10.1.1.4 | 223
Sending message to /10.1.1.5 | 09:05:46,256 | 10.1.1.4 | 673
Partition index with 17 entries found for sstable 14918 | 09:05:46,534 | 10.1.1.5 | 283815
Seeking to partition indexed section in data file | 09:05:46,534 | 10.1.1.5 | 283851
Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones | 09:05:46,534 | 10.1.1.5 | 283867
Merging data from memtables and 1 sstables | 09:05:46,534 | 10.1.1.5 | 283873
Read 2 live and 0 tombstoned cells | 09:05:46,571 | 10.1.1.5 | 321319
Enqueuing response to /10.1.1.4 | 09:05:46,571 | 10.1.1.5 | 321439
Sending message to /10.1.1.4 | 09:05:46,571 | 10.1.1.5 | 321613
Message received from /10.1.1.5 | 09:05:46,579 | 10.1.1.4 | 323621
Processing response from /10.1.1.5 | 09:05:46,579 | 10.1.1.4 | 323730
Request complete | 09:05:46,579 | 10.1.1.4 | 324458
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------+--------------+----------+----------------
execute_cql3_query | 05:39:12,430 | 10.1.1.4 | 0
Parsing select * from datachannel_6min where channel_id = 1030 order by time_start desc limit 1; | 05:39:12,430 | 10.1.1.4 | 164
Preparing statement | 05:39:12,430 | 10.1.1.4 | 310
Sending message to /10.1.1.6 | 05:39:12,431 | 10.1.1.4 | 829
Message received from /10.1.1.4 | 05:39:12,432 | 10.1.1.6 | 19
Executing single-partition query on datachannel_6min | 05:39:12,433 | 10.1.1.6 | 719
Acquiring sstable references | 05:39:12,433 | 10.1.1.6 | 742
Merging memtable tombstones | 05:39:12,433 | 10.1.1.6 | 769
Bloom filter allows skipping sstable 1476 | 05:39:12,433 | 10.1.1.6 | 830
Partition index with 0 entries found for sstable 1475 | 05:39:12,433 | 10.1.1.6 | 904
Seeking to partition indexed section in data file | 05:39:12,433 | 10.1.1.6 | 919
Partition index with 2 entries found for sstable 1346 | 05:39:12,434 | 10.1.1.6 | 1403
Seeking to partition indexed section in data file | 05:39:12,434 | 10.1.1.6 | 1425
Partition index with 2 entries found for sstable 1472 | 05:39:12,434 | 10.1.1.6 | 1511
Seeking to partition indexed section in data file | 05:39:12,434 | 10.1.1.6 | 1522
Partition index with 0 entries found for sstable 586 | 05:39:12,434 | 10.1.1.6 | 1567
Seeking to partition indexed section in data file | 05:39:12,434 | 10.1.1.6 | 1578
Partition index with 146 entries found for sstable 5 | 05:39:12,434 | 10.1.1.6 | 2132
Seeking to partition indexed section in data file | 05:39:12,434 | 10.1.1.6 | 2152
Skipped 0/6 non-slice-intersecting sstables, included 0 due to tombstones | 05:39:12,434 | 10.1.1.6 | 2177
Merging data from memtables and 5 sstables | 05:39:12,434 | 10.1.1.6 | 2192
Read 2 live and 0 tombstoned cells | 05:39:13,106 | 10.1.1.6 | 673858
Enqueuing response to /10.1.1.4 | 05:39:13,106 | 10.1.1.6 | 674163
Sending message to /10.1.1.4 | 05:39:13,107 | 10.1.1.6 | 674329
Message received from /10.1.1.6 | 05:39:13,107 | 10.1.1.4 | 676882
Processing response from /10.1.1.6 | 05:39:13,107 | 10.1.1.4 | 677118
Request complete | 05:39:13,107 | 10.1.1.4 | 677344
activity | timestamp | source | source_elapsed
--------------------------------------------------------------------------------------------------+--------------+----------+----------------
execute_cql3_query | 05:40:41,322 | 10.1.1.4 | 0
Parsing select * from datachannel_6min where channel_id = 1028 order by time_start desc limit 1; | 05:40:41,322 | 10.1.1.4 | 104
Preparing statement | 05:40:41,322 | 10.1.1.4 | 257
Sending message to /10.1.1.5 | 05:40:41,322 | 10.1.1.4 | 569
Message received from /10.1.1.4 | 05:40:41,324 | 10.1.1.5 | 9
Executing single-partition query on datachannel_6min | 05:40:41,324 | 10.1.1.5 | 401
Acquiring sstable references | 05:40:41,324 | 10.1.1.5 | 410
Merging memtable tombstones | 05:40:41,324 | 10.1.1.5 | 427
Bloom filter allows skipping sstable 15658 | 05:40:41,324 | 10.1.1.5 | 451
Bloom filter allows skipping sstable 15666 | 05:40:41,324 | 10.1.1.5 | 476
Bloom filter allows skipping sstable 15892 | 05:40:41,324 | 10.1.1.5 | 489
Bloom filter allows skipping sstable 15749 | 05:40:41,324 | 10.1.1.5 | 503
Bloom filter allows skipping sstable 15874 | 05:40:41,324 | 10.1.1.5 | 514
Bloom filter allows skipping sstable 15682 | 05:40:41,324 | 10.1.1.5 | 523
Partition index with 14 entries found for sstable 14918 | 05:40:42,152 | 10.1.1.5 | 828365
Seeking to partition indexed section in data file | 05:40:42,152 | 10.1.1.5 | 828406
Skipped 0/7 non-slice-intersecting sstables, included 0 due to tombstones | 05:40:42,152 | 10.1.1.5 | 828422
Merging data from memtables and 1 sstables | 05:40:42,152 | 10.1.1.5 | 828427
Read 2 live and 0 tombstoned cells | 05:40:42,300 | 10.1.1.5 | 976825
Enqueuing response to /10.1.1.4 | 05:40:42,301 | 10.1.1.5 | 976984
Message received from /10.1.1.5 | 05:40:42,301 | 10.1.1.4 | 978829
Sending message to /10.1.1.4 | 05:40:42,301 | 10.1.1.5 | 977105
Processing response from /10.1.1.5 | 05:40:42,301 | 10.1.1.4 | 979018
Request complete | 05:40:42,301 | 10.1.1.4 | 979239
这是我集群的历史记录和我的错误
- 在西欧Azure数据中心的虚拟网络中安装了3个节点。我启动了服务,将一个API记录到Cassandra中。(约10/s)。我启动了第二个服务,它使用添加的数据来计算新数据(这就是使用上面的选择的地方)
- 将旧数据(MSSQL中的5亿行)移动到Cassandra中。在大约3天内与我的服务并行运行李>
- [错误]硬盘驱动器已满。我犯了一个愚蠢的错误,忘记为数据添加单独的磁盘。我在每台机器上安装了4个磁盘,并将它们“合并”为一个()。我将日志和数据目录移动到所有三个节点上的新磁盘上。其中两个节点工作正常,但第三个节点我必须彻底清理(删除数据/日志)。我的复制系数为2,因此没有数据丢失。我在“新建”节点上运行了nodetool修复
- 当我再次开始查询集群时,我注意到我的选择不一致。如果我在Datastax Devcenter中运行查询,我将无法获得查询结果,但经过3-5次尝试后,我得到了完整的答复。我将我的查询改为使用Quorom,而不是一个似乎能解决问题的查询李>
- 我还在两个好的节点上运行了nodetool清理
- 最后,我在一个好的节点上运行nodetool修复,现在也在最后一个节点上运行(运行大约需要1天)
- 我有两个建议:
-
<> >如果您总是查询最近的行(<代码>按时间顺序启动DESC限制1;),那么您应该考虑在群集顺序中指定<代码> DESC结束排序方向。这将比使用
ASC
结束而使用DESC
结束进行查询的集群更快
channel\u id=1028
)分布在几个SSTABLE文件上。由于您的数据似乎是一个时间序列,您可以尝试使用。DateTieredCompactionStrategy按时间戳对磁盘上的数据进行分组。从理论上讲,这将使您的查询仅限于少量(甚至可能是单个)SSTABLE文件。特别是如果您只需要最新的一行删除
(您不能更改
集群顺序),重新加载,然后像这样重新创建您的表:
CREATE TABLE datachannel_6min (
channel_id int,
time_start timestamp,
power_avg float,
power_min float,
power_max float,
energy float,
temperature_in float,
PRIMARY KEY (channel_id, time_start)
) WITH CLUSTERING ORDER BY (time_start DESC)
AND COMPACTION = {'class': 'DateTieredCompactionStrategy'};
您可以使用DateTieredCompactionStrategy设置一些选项,我在上面链接的文章中概述了这些选项。通读一遍,确保默认设置适合您,或者根据需要进行调整。我有两个建议:
-
<> >如果您总是查询最近的行(<代码>按时间顺序启动DESC限制1;),那么您应该考虑在群集顺序中指定<代码> DESC结束排序方向。这将比使用
ASC
结束而使用DESC
结束进行查询的集群更快
channel\u id=1028
)分布在几个SSTABLE文件上。由于您的数据似乎是一个时间序列,您可以尝试使用。DateTieredCompactionStrategy按时间戳对磁盘上的数据进行分组。从理论上讲,这将使您的查询仅限于少量(甚至可能是单个)SSTABLE文件。特别是如果您只需要最新的一行删除
(您不能更改
集群顺序),重新加载,然后像这样重新创建您的表:
CREATE TABLE datachannel_6min (
channel_id int,
time_start timestamp,
power_avg float,
power_min float,
power_max float,
energy float,
temperature_in float,
PRIMARY KEY (channel_id, time_start)
) WITH CLUSTERING ORDER BY (time_start DESC)
AND COMPACTION = {'class': 'DateTieredCompactionStrategy'};
您可以使用DateTieredCompactionStrategy设置一些选项,我在上面链接的文章中概述了这些选项。通读一遍,确保默认值适合您,或者根据需要进行调整。事实上,稍后,我还会订购其他方向的产品。。我将从时间跨度中获取数据。。因此,我想我不会从更改集群顺序中获得任何收益。但是你的文章真的很有趣。我的脚本在时间上“向后”添加了数据,这样可能会使SSTABLES变得杂乱无章。。。但是,由于SSTABLES有最小/最大的metedata,我仍然不明白这如何解释我的400-1200ms读取时间:S。。不过我现在会一遍又一遍地读这篇文章。谢谢我猜修改tables压缩类不会重新组织旧数据,所以我现在考虑重做所有事情(清除所有数据)。这次按正确的顺序添加。@Fischer实际上,下次触发压缩时,它应该对整个键空间使用新策略,这包括现有数据。对此我不确定。我读过关于DataTierdCompation的文章,看起来它在无序添加数据或添加积压日志方面存在问题,因为数据点的时间与记录添加到Cassandra的时间不匹配。这听起来很适合新数据,但因为我在3年前导入了数据…@Fischer实际上,你是对的。您导入的数据将在导入日期加上时间戳,而不是
开始时间
。但接下来,您的新数据将被写入并压缩在一起。事实上,稍后,我还会订购其他方向的数据。。我将从时间跨度中获取数据。。因此,我想我不会从更改集群顺序中获得任何收益。但是你的文章真的很有趣。我的脚本添加了