Cassandra 什么是；每分区限制“；在卡桑德拉的cql查询中是什么意思？_Cassandra_Cqlsh_Scylla

Cassandra 什么是；每分区限制“；在卡桑德拉的cql查询中是什么意思？

cassandra

Cassandra 什么是；每分区限制“；在卡桑德拉的cql查询中是什么意思？,cassandra,cqlsh,scylla,Cassandra,Cqlsh,Scylla,我有一张锡拉表，如下所示： cqlsh:sampleks> describe table test; CREATE TABLE test ( client_id int, when timestamp, process_ids list<int>, md text, PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC) AND bloom_fil

我有一张锡拉表，如下所示：

cqlsh:sampleks> describe table test;

CREATE TABLE test (
    client_id int,
    when timestamp,
    process_ids list<int>,
    md text,
    PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 172800
    AND max_index_interval = 1024
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

每个分区的

限制

子句在“宽分区场景”中很有用它只返回分区中的前两行

以这个查询为例：

aploetz@cqlsh:stackoverflow> SELECT client_id,when,md 
        FROM test PER PARTITION LIMIT 2 ;

考虑到

（client\u id，when）

的主键定义，该查询将迭代每个

client\u id

。然后，Cassandra将只返回该分区中的前两行（当时按聚集），而不管当时可能存在多少个

在本例中，我使用两个不同的

client\u id

s（总共2个分区），在

test

表中插入了7行。使用2的

每个分区限制

返回4行（2

客户机id

每个分区限制

2）==4行

 client_id | when                            | md
-----------+---------------------------------+-----
         1 | 2020-05-06 12:00:00.000000+0000 | md1
         1 | 2020-05-05 22:00:00.000000+0000 | md1
         2 | 2020-05-06 19:00:00.000000+0000 | md2
         2 | 2020-05-06 01:00:00.000000+0000 | md2

(4 rows)

是的，每个分区的限制刚刚添加到Scylla开源3.1中。您甚至可以在同一语句中混合使用常规限制和每个分区限制。更多信息：对于锡拉来说，这是映射到卡桑德拉-7017（）的问题#2202（）@Aaron我还有一个关于卡桑德拉模式设计的问题。我想看看您对我的问题是否有什么想法？对于那些使用spark的人来说，

。perPartitionLimit（）

是RDD在分区中选择一个唯一行的可行命令。

 client_id | when                            | md
-----------+---------------------------------+-----
         1 | 2020-05-06 12:00:00.000000+0000 | md1
         1 | 2020-05-05 22:00:00.000000+0000 | md1
         2 | 2020-05-06 19:00:00.000000+0000 | md2
         2 | 2020-05-06 01:00:00.000000+0000 | md2

(4 rows)