elasticsearch Elasticsearch未分配碎片每两小时发出一次警告,elasticsearch,kubernetes,elastic-stack,elk,circuit-breaker,elasticsearch,Kubernetes,Elastic Stack,Elk,Circuit Breaker" /> elasticsearch Elasticsearch未分配碎片每两小时发出一次警告,elasticsearch,kubernetes,elastic-stack,elk,circuit-breaker,elasticsearch,Kubernetes,Elastic Stack,Elk,Circuit Breaker" />

elasticsearch Elasticsearch未分配碎片每两小时发出一次警告

elasticsearch Elasticsearch未分配碎片每两小时发出一次警告,elasticsearch,kubernetes,elastic-stack,elk,circuit-breaker,elasticsearch,Kubernetes,Elastic Stack,Elk,Circuit Breaker,我们的集群有3个elasticsearch数据吊舱/3个主吊舱/1个客户端和1个导出器。问题在于警报“Elasticsearch因断路异常而未分配碎片”。您可以在本文档中进一步了解这一点 现在,通过调用curlhttp://localhost:9200/_nodes/stats,我已经计算出堆的使用率是跨数据吊舱的平均值 eleasticsearch-data-0、1和2的堆使用率分别为68%、61%和63% 我进行了以下API调用,可以看到碎片几乎均匀分布 curl-shttp://local

我们的集群有3个elasticsearch数据吊舱/3个主吊舱/1个客户端和1个导出器。问题在于警报“Elasticsearch因断路异常而未分配碎片”。您可以在本文档中进一步了解这一点

现在,通过调用curlhttp://localhost:9200/_nodes/stats,我已经计算出堆的使用率是跨数据吊舱的平均值

eleasticsearch-data-0、1和2的堆使用率分别为68%、61%和63%

我进行了以下API调用,可以看到碎片几乎均匀分布

curl-shttp://localhost:9200/_cat/shards |grep elasticsearch-data-0 | wc-l

curl-shttp://localhost:9200/_cat/shards |grep elasticsearch-data-1 |wc-l

curl-shttp://localhost:9200/_cat/shards |grep elasticsearch-data-2 |wc-l

下面是allocate explain curl调用的输出

curl-shttp://localhost:9200/_cluster/allocation/explain |python-m json.tool

{
“分配解释”:“无法分配,因为不允许分配给任何节点”,
“能否分配”:“否”,
“当前状态”:“未分配”,
“索引”:“图24图18549”,
“节点分配决策”:[
{
“决策者”:[
{
“决策者”:“最大重试次数”,
“决定”:“否”,
“解释”:“碎片已超过分配尝试失败时的最大重试次数[50]-手动调用[/\u群集/重新路由?重试失败=true]重试,[unassigned_info[[reason=allocation_failed],在[2020-10-31T09:18:44.115Z],失败的分配尝试[50],延迟=false,详细信息[节点[nodeid1]上失败的碎片]:未能执行索引:在[2020-10-31T09:16:42.146Z]上的副本[graph_24_18549][0]、节点[nodeid1]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/大容量[s]、失败的_尝试[49]、延迟=false、详细信息[节点[NodeIdId2]上失败的碎片]:未能执行索引:在[2020-10-31T09:15:05.849Z]上的副本[graph_24_18549][0]、节点[nodeid2]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid2]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/批量[s]、失败的_尝试[48]、延迟=false、详细信息[节点[NodeIdId1]上失败的碎片]:未能执行索引:副本[tsg_ngf_graph_1_mtermmetrics1_vertex_24_18549][0]、节点[nodeid1]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid3]、未分配的_信息[reason=ALLOCATION_failed],在[2020-10-31T09:11:50.143Z],失败的_尝试[47]、延迟=false、详细信息[节点[nodeid2]失败的碎片]:未能执行索引:在[2020-10-31T09:08:10.182Z]的副本[graph_24_18549][0]、节点[o_9jyrmOSca9T12J4bY0Nw]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid4]、未分配的_信息[reason=ALLOCATION_failed_]、失败的_尝试[46]、延迟=false、详细信息[节点[nodeid1]上失败的碎片]:未能执行索引:在[2020-10-31T09:07:03.102Z]对副本[graph_24_18549][0]、节点[nodeid1]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid6]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/大容量[s],失败的_尝试[45]、延迟=false、详细信息[节点[nodeid2]上失败的碎片]:未能执行索引:在[2020-10-31T09:05:53.267Z]上的副本[graph_24_18549][0]、节点[nodeid2]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid7]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/批量[s],失败的_尝试[44]、延迟=错误、详细信息[NodeIdId2]失败的碎片:未能执行索引:在[2020-10-31T09:04:24.507Z]上的副本[graph_24_18549][0]、节点[nodeid2]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid8]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/批量[s],失败的_尝试[43]、延迟=false、详细信息[节点[nodeid1]上失败的碎片]:未能执行索引:在[2020-10-31T09:03:02.018Z]上的副本[graph_24_18549][0]、节点[nodeid1]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid9]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/批量[s],失败的_尝试[42]、延迟=false、详细信息[节点[nodeid2]上失败的碎片]:未能执行索引:在[2020-10-31T09:01:38.094Z]上的副本[graph_24_18549][0]、节点[nodeid2]、[R]、恢复源[peer recovery]、s[Initialization]、a[id=someid10]、未分配的_信息[reason=ALLOCATION_failed]上的数据/写入/批量[s],失败的_尝试[41]、延迟=错误、详细信息[节点[NodeIdId1]上失败的碎片]:故障恢复失败,故障恢复失败异常[[graph_24_18549][0]:从{elasticsearch-data-2}{}{}{}}恢复到{elasticsearch-data-1}{{}{IP}{IP:9300}]失败;嵌套:RemoteTransportException[[elasticsearch-data-2][IP:9300][internal internal index/shard/recovery/start_recovery];嵌套:BreakingException[[parent]数据太大,用于[]将为[2012997826/1.8gb],大于[1972122419/1.8gb]的限制,实际使用量:[2012934784/1.8gb],保留的新字节:[63042/61.5kb];];分配状态[no_trunt];预期的切分大小[4338334540],故障远程传输异常[[elasticsearch-data-0][IP:9300][index:data/write/bulk s][r];嵌套:ALReadyCLOCException[引擎已关闭];]、分配状态[无尝试]、预期的切分大小[504039519]、故障RemoteTransportException[[elasticsearch-data-1][IP:9300][Indexes:data/write/bulk[s][r]];嵌套:断路异常[[parent]数据太大,[]的数据将为[2452709390/2.2gb],大于[1972122419/1.8gb]的限制,实际使用率:[2060112120/1.9gb],保留的新字节:[392597270/374.4mb]];],分配状态[无尝试]],预期的碎片大小[2606804616],故障远程传输异常[[elasticsearch-data-0][IP:9300][index:data/write/bulk[r]];嵌套:AlreadCyclosedException[引擎关闭];]、分配状态[无尝试]、预期碎片大小[4799579998],失败RemoteTransportException[[elasticsearch-data-0]
145
145
142
{
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "can_allocate": "no",
    "current_state": "unassigned",
    "index": "graph_24_18549",
    "node_allocation_decisions": [
        {
            "deciders": [
                {
                    "decider": "max_retry",
                    "decision": "NO",
                    "explanation": "shard has exceeded the maximum number of retries [50] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:18:44.115Z], failed_attempts[50], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:16:42.146Z], failed_attempts[49], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid2], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:15:05.849Z], failed_attempts[48], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [tsg_ngf_graph_1_mtermmetrics1_vertex_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid3], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:11:50.143Z], failed_attempts[47], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[o_9jyrmOSca9T12J4bY0Nw], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid4], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:08:10.182Z], failed_attempts[46], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid6], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:07:03.102Z], failed_attempts[45], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid7], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:05:53.267Z], failed_attempts[44], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid8], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:04:24.507Z], failed_attempts[43], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid9], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:03:02.018Z], failed_attempts[42], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid10], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:01:38.094Z], failed_attempts[41], delayed=false, details[failed shard on node [nodeid1]: failed recovery, failure RecoveryFailedException[[graph_24_18549][0]: Recovery failed from {elasticsearch-data-2}{}{} into {elasticsearch-data-1}{}{}{IP}{IP:9300}]; nested: RemoteTransportException[[elasticsearch-data-2][IP:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2012997826/1.8gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2012934784/1.8gb], new bytes reserved: [63042/61.5kb]]; ], allocation_status[no_attempt]], expected_shard_size[4338334540], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[5040039519], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2452709390/2.2gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2060112120/1.9gb], new bytes reserved: [392597270/374.4mb]]; ], allocation_status[no_attempt]], expected_shard_size[2606804616], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4799579998], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4012459974], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2045921066/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1770141176/1.6gb], new bytes reserved: [275779890/263mb]]; ], allocation_status[no_attempt]], expected_shard_size[3764296412], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[2631720247], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2064366222/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1838754456/1.7gb], new bytes reserved: [225611766/215.1mb]]; ], allocation_status[no_attempt]], expected_shard_size[3255872204], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2132674062/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1902340880/1.7gb], new bytes reserved: [230333182/219.6mb]]; ], allocation_status[no_attempt]], expected_shard_size[2956220256], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2092139364/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1855009224/1.7gb], new bytes reserved: [237130140/226.1mb]]; ], allocation_status[no_attempt]]]"
                },
{
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[graph_24_18549][0], node[nodeid2], [P], s[STARTED], a[id=someid]]"
                }
            ],
            "node_decision": "no",
            "node_id": "nodeid2",
            "node_name": "elasticsearch-data-2",
            "transport_address": "IP:9300"
        }