由于内存问题,Hadoop HA备用名称节点启动在安全模式下挂起

由于内存问题,Hadoop HA备用名称节点启动在安全模式下挂起,hadoop,hdfs,namenode,Hadoop,Hdfs,Namenode,由于内存不足(块/文件太多),NN-a(活动)崩溃后,我们用更多内存升级NN-a,但不立即升级NN-B(非活动) 通过差分HeapSize,我们删除了一些文件(8000万到7000万),然后NN-B崩溃。NN-A开始活跃 然后我们升级了NN-B,并启动它。它在安全模式下运行,日志如下: 报告的块4620668需要额外的62048327块以达到总块66735729的阈值0.9990 报告的块X需要..X缓慢增加,我检查了堆使用情况: Attaching to process ID 11598, p

由于内存不足(块/文件太多),NN-a(活动)崩溃后,我们用更多内存升级NN-a,但不立即升级NN-B(非活动)

通过差分HeapSize,我们删除了一些文件(8000万到7000万),然后NN-B崩溃。NN-A开始活跃

然后我们升级了NN-B,并启动它。它在安全模式下运行,日志如下:

报告的块4620668需要额外的62048327块以达到总块66735729的阈值0.9990

报告的块X需要..
X缓慢增加,我检查了堆使用情况:

Attaching to process ID 11598, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.79-b02

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 107374182400 (102400.0MB)
   NewSize          = 2006515712 (1913.5625MB)
   MaxNewSize       = 2006515712 (1913.5625MB)
   OldSize          = 4013096960 (3827.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 1805910016 (1722.25MB)
   used     = 1805910016 (1722.25MB)
   free     = 0 (0.0MB)
   100.0% used
Eden Space:
   capacity = 1605304320 (1530.9375MB)
   used     = 1605304320 (1530.9375MB)
   free     = 0 (0.0MB)
   100.0% used
From Space:
   capacity = 200605696 (191.3125MB)
   used     = 200605696 (191.3125MB)
   free     = 0 (0.0MB)
   100.0% used
To Space:
   capacity = 200605696 (191.3125MB)
   used     = 0 (0.0MB)
   free     = 200605696 (191.3125MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 105367666688 (100486.4375MB)
   used     = 105192740832 (100319.61520385742MB)
   free     = 174925856 (166.82229614257812MB)
   99.83398526179955% used
Perm Generation:
   capacity = 68755456 (65.5703125MB)
   used     = 41562968 (39.637535095214844MB)
   free     = 27192488 (25.932777404785156MB)
   60.45042883578577% used

14501 interned Strings occupying 1597840 bytes.
 num     #instances         #bytes  class name
----------------------------------------------
   1:     185594071    13362773112  org.apache.hadoop.hdfs.protocol.proto.HdfsProtos$BlockProto
   2:     185594071    13362773112  org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$ReceivedDeletedBlockInfoProto
   3:     101141030    10550504248  [Ljava.lang.Object;
   4:     185594072     7423762880  org.apache.hadoop.hdfs.protocol.Block
   5:     185594070     7423762800  org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
   6:      63149803     6062381088  org.apache.hadoop.hdfs.server.namenode.INodeFile
   7:      23241035     5705267888  [B
同时,NN-A的堆:

Attaching to process ID 6061, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.79-b02

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 107374182400 (102400.0MB)
   NewSize          = 1134100480 (1081.5625MB)
   MaxNewSize       = 1134100480 (1081.5625MB)
   OldSize          = 2268266496 (2163.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 1020723200 (973.4375MB)
   used     = 643184144 (613.3881988525391MB)
   free     = 377539056 (360.04930114746094MB)
   63.01259185644061% used
Eden Space:
   capacity = 907345920 (865.3125MB)
   used     = 639407504 (609.7865142822266MB)
   free     = 267938416 (255.52598571777344MB)
   70.47009193582973% used
From Space:
   capacity = 113377280 (108.125MB)
   used     = 3776640 (3.6016845703125MB)
   free     = 109600640 (104.5233154296875MB)
   3.3310377528901736% used
To Space:
   capacity = 113377280 (108.125MB)
   used     = 0 (0.0MB)
   free     = 113377280 (108.125MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 106240081920 (101318.4375MB)
   used     = 42025146320 (40078.30268859863MB)
   free     = 64214935600 (61240.13481140137MB)
   39.55677138092327% used
Perm Generation:
   capacity = 51249152 (48.875MB)
   used     = 51131744 (48.763031005859375MB)
   free     = 117408 (0.111968994140625MB)
   99.77090742886828% used

16632 interned Strings occupying 1867136 bytes.
我们试图重新启动这两个,NN-A启动并在10分钟内激活,但NN-B永远卡住了

最后,我放弃了堆的使用:

Attaching to process ID 11598, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.79-b02

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 107374182400 (102400.0MB)
   NewSize          = 2006515712 (1913.5625MB)
   MaxNewSize       = 2006515712 (1913.5625MB)
   OldSize          = 4013096960 (3827.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 1805910016 (1722.25MB)
   used     = 1805910016 (1722.25MB)
   free     = 0 (0.0MB)
   100.0% used
Eden Space:
   capacity = 1605304320 (1530.9375MB)
   used     = 1605304320 (1530.9375MB)
   free     = 0 (0.0MB)
   100.0% used
From Space:
   capacity = 200605696 (191.3125MB)
   used     = 200605696 (191.3125MB)
   free     = 0 (0.0MB)
   100.0% used
To Space:
   capacity = 200605696 (191.3125MB)
   used     = 0 (0.0MB)
   free     = 200605696 (191.3125MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 105367666688 (100486.4375MB)
   used     = 105192740832 (100319.61520385742MB)
   free     = 174925856 (166.82229614257812MB)
   99.83398526179955% used
Perm Generation:
   capacity = 68755456 (65.5703125MB)
   used     = 41562968 (39.637535095214844MB)
   free     = 27192488 (25.932777404785156MB)
   60.45042883578577% used

14501 interned Strings occupying 1597840 bytes.
 num     #instances         #bytes  class name
----------------------------------------------
   1:     185594071    13362773112  org.apache.hadoop.hdfs.protocol.proto.HdfsProtos$BlockProto
   2:     185594071    13362773112  org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$ReceivedDeletedBlockInfoProto
   3:     101141030    10550504248  [Ljava.lang.Object;
   4:     185594072     7423762880  org.apache.hadoop.hdfs.protocol.Block
   5:     185594070     7423762800  org.apache.hadoop.hdfs.server.protocol.ReceivedDeletedBlockInfo
   6:      63149803     6062381088  org.apache.hadoop.hdfs.server.namenode.INodeFile
   7:      23241035     5705267888  [B

它显示了非常多的ReceivedDeletedBlock计数,但是为什么呢?

我通过将
dfs.blockreport.initialDelay
更改为
300
,解决了这个问题,失败的原因是
Block Report Storm