Performance cassandra具有集合的读取性能_Performance_Cassandra_Cql

Performance cassandra具有集合的读取性能

performance cassandra

Performance cassandra具有集合的读取性能,performance,cassandra,cql,Performance,Cassandra,Cql,我在cassandra中定义了以下系列 CREATE TABLE metric ( period int, rollup int, tenant text, path text, time bigint, data list<double>, PRIMARY KEY ((tenant, period, rollup, path), time) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND

我在cassandra中定义了以下系列

CREATE TABLE metric (
period int,
rollup int,
tenant text,
path text,
time bigint,
data list&lt;double>,
PRIMARY KEY ((tenant, period, rollup, path), time)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};

数据列表的大小是否会影响cassandra中的读取性能？如果是，我们如何测量它

问题在于，对于给定的路径/周期/汇总组合，从cassandra查询Data-Set1以获得8640行（其中每行数据列表中的元素数为90）所需的时间超过了查询数据集2所需的时间，而数据集2是8640行数据（其中每行数据列表中的元素数为10）

另外，如果我在10个用户同时访问Data-Set1的情况下运行一个性能测试，那么我会在后端看到cassandra超时，它会花费大量时间进行垃圾收集，但当我通过查询Data-Set2执行相同操作时，不会发生同样的情况

因此，我的结论是，数据列表中的元素数量正在影响性能

您是否在cassandra堆栈中看到类似的性能问题？

我不认为一个集合中的90个项目会有那么大的意义，但在您的情况下，我想是的。问题是，当查询集合列时，Cassandra不能只返回集合的一部分。它必须返回整个列（集合）。那个手术不是免费的，但我不认为90场双打是什么大不了的

要尝试的一件事是打开跟踪。这应该会让您了解在运行查询时Cassandra在做什么

aploetz@cqlsh:stackoverflow> tracing on;

通常，打开跟踪可以引导您找到cuplrit

它在垃圾收集上花费了很多时间

您是否使用任何特殊的JVM设置？每个节点上有多少RAM？中断正常操作的GC（对我来说）表明JVM堆设置可能有问题。上的DataStax文档指出，您应该根据节点的RAM使用以下准则来调整堆的大小：

System Memory       Heap Size

Less than 2GB       1/2 of system memory
2GB to 4GB          1GB
Greater than 4GB    1/4 system memory, but not more than 8GB