Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 在一致性1的读取查询期间Cassandra超时(需要1个响应,但仅0个副本响应)_Hadoop_Cassandra_Apache Spark_Datastax_Datastax Java Driver - Fatal编程技术网

Hadoop 在一致性1的读取查询期间Cassandra超时(需要1个响应,但仅0个副本响应)

Hadoop 在一致性1的读取查询期间Cassandra超时(需要1个响应,但仅0个副本响应),hadoop,cassandra,apache-spark,datastax,datastax-java-driver,Hadoop,Cassandra,Apache Spark,Datastax,Datastax Java Driver,我正在对一个有500000行的表进行读取和更新查询,有时在处理了大约300000行之后,甚至在没有节点停机的情况下,也会出现低于错误的情况 consistency ONE读取查询期间的Cassandra超时(需要1个响应,但只有0个副本响应) 基础设施详细信息: 具有5个Cassandra节点、5个spark节点和3个Hadoop节点,每个节点具有8个内核和28 GB内存,Cassandra复制因子为3 卡桑德拉2.1.8.621 | DSE 4.7.1 | Spark 1.2.1 | Hado

我正在对一个有500000行的表进行读取和更新查询,有时在处理了大约300000行之后,甚至在没有节点停机的情况下,也会出现低于错误的情况

consistency ONE读取查询期间的Cassandra超时(需要1个响应,但只有0个副本响应)

基础设施详细信息:
具有5个Cassandra节点、5个spark节点和3个Hadoop节点,每个节点具有8个内核和28 GB内存,Cassandra复制因子为3

卡桑德拉2.1.8.621 | DSE 4.7.1 | Spark 1.2.1 | Hadoop 2.7.1

卡桑德拉配置:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.
CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
我也尝试过同样的工作,将
读取请求\u超时\u in \u ms
(ms)增加到20000,但没有效果

我在两张桌子上做查询。下面是其中一个表的create语句:

创建表格:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.
CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
查询:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.
CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
1)
从患者id='1234'和遭遇开始日期>='“+格式化文档日期+””部分选择遭遇,遭遇开始日期允许过滤


2)
UPDATE section_ks.conferences SET testproblem_uuid_SET=testproblem_uuid_SET+{'1256'},其中conference_uuid='abcd345'

通常,当您遇到超时错误时,这意味着您正在尝试执行一些在Cassandra中伸缩性不好的操作。修复方法通常是修改模式

我建议您在运行查询时监视节点,看看是否可以发现问题区域。例如,您可以运行“watch-n1nodetool tpstats”来查看是否有任何队列正在备份或丢弃项目。参见其他监控建议

在您的配置中,有一件事可能是关闭的,您说您有五个Cassandra节点,但只有三个spark Worker(或者您是说每个Cassandra节点上有三个spark Worker?)您希望每个Cassandra节点上至少有一个spark worker,以便将数据加载到spark是在每个节点上本地完成的,而不是通过网络完成的


如果看不到您的模式和正在运行的查询,很难说出更多。您正在从单个分区读取数据吗?从单个分区读取时,我开始在300000行附近出现超时错误。见问题。到目前为止,我找到的唯一解决方法是在分区键中使用客户端哈希,将分区分解为约100K行的较小块。到目前为止,我还没有找到一种方法告诉Cassandra不要为一个我认为需要很长时间的查询超时。

我认为配置不是根本原因,而是数据模型问题

看到section_ks.Conferences表的结构会很酷

建议在设计表结构之前仔细考虑需要运行哪些具体查询

据我所见,这两个查询需要不同的section_ks.continers结构来以良好的性能运行它们

让我们查看提供的每个查询,并尝试设计表:

第一个:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.
CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
从“遭遇”部分中选择遭遇ID、遭遇开始日期 其中患者id='1234'和遭遇开始日期>='+ 格式化文档\u日期+“'允许筛选

  • 第一点,如果Cassandra强制您添加ALLOW FILTERING,这是一个非最佳查询或表结构的符号
  • 第二点。主键。如果patient\u id列和Conference\u start\u date列构成一个复合主键,那么关于给定查询的令人敬畏的解释将运行得很快,而不需要强制的ALLOW FILTERING语句主键()语句中列的枚举应与查询中的筛选顺序相对应。
  • 为什么允许在原始查询中强制过滤?通过分区键,Cassandra知道数据位于哪个节点上。当patient_id列不是分区键时,Cassandra必须扫描所有5个节点以查找请求的患者。当我们跨节点拥有大量数据时,这种完全扫描通常会因超时而失败
下面是一个表结构与给定查询有效匹配的示例:

create table section_ks.encounters(
    patient_id bigint, 
    encounter_start_date timestamp, 
    encounter_uuid text,
    some_other_non_unique_column text,
    PRIMARY KEY (patient_id, encounter_start_date)
);
  • “患者id”列将是一个“分区键”。负责跨Cassandra节点的数据分发。简单地说(省略复制功能):不同范围的患者将存储在不同的节点上
  • Conference_start_date列将是一个“集群键”,负责分区内的数据排序
现在可以从查询中删除“允许筛选”:

SELECT encounter_uuid, encounter_start_date 
FROM section_ks.encounters 
WHERE patient_id = '1234' AND encounter_start_date >= '2017-08-19';
第二次查询:

更新部分\u ks.conferences SET testproblem\u uuid\u SET= testproblem_uuid_set+{'1256'}其中遇到_uuid='abcd345'

表结构应类似于:

create table section_ks.encounters(
    encounter_uuid text, -- partition key
    patient_id bigint,
    testproblem_uuid_set text, 
    some_other_non_unique_column text,
    PRIMARY KEY (encounter_uuid)
);
如果我们确定只想通过遇到uuuid进行快速过滤,那么应该将其定义为分区键

关于设计有效数据模型的好文章:


你能发布你的创建表…和你的查询吗?我也会尝试跟踪
来分析这个问题。@phact我已经添加了创建表。感谢您的响应。@uri2x还添加了查询。不要在生产oltp查询中使用“允许筛选”。这将是缓慢的。相反,您应该设计表的主键(分区和集群),以便可以使用常规的CQL查询。非常感谢。我将尝试您的建议。抱歉,有关群集的错误/简要信息。实际上,EC2集群有5个Cassandra节点,5个spark worker节点,其中2个spark worker节点位于2个Cassandra节点上,其他3个节点上有hadoop和spark worker。抱歉,但是如何检查读取了多少分区数据?cfstats和cfhistograms@Abhinandan-使用“允许筛选”表示您正在尝试进行表扫描。这在卡桑德拉是没有效率的,