Linux kernel 理解内核恐慌/建议

Linux kernel 理解内核恐慌/建议,linux-kernel,Linux Kernel,我将debian jessie与内核3.16.39-1一起使用: # apt-cache policy linux-image-3.16.0-4-amd64 linux-image-3.16.0-4-amd64: Installed: 3.16.39-1 Candidate: 3.16.39-1 Version table: *** 3.16.39-1 0 500 http://ftp.fr.debian.org/debian/ jessie/main amd64 Packages

我将debian jessie与内核3.16.39-1一起使用:

# apt-cache policy linux-image-3.16.0-4-amd64
linux-image-3.16.0-4-amd64:
Installed: 3.16.39-1
Candidate: 3.16.39-1
Version table:
*** 3.16.39-1 0
    500 http://ftp.fr.debian.org/debian/ jessie/main amd64 Packages
    100 /var/lib/dpkg/status
本机使用2个粘接接口:

  • bond0:2*10Gb/s ixgbe X520
  • bond1:2*10Gb/s ixgbe X520
irqbalance正在这台机器上运行

在网络负载(bond1上为12Gb/s)下,我出现了以下内核死机:

kernel: [26339.017497] Call Trace:
kernel: [26339.017499]  <IRQ>  [<ffffffff81514c11>] ?     dump_stack+0x5d/0x78
kernel: [26339.017509]  [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513]  [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515]  [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519]  [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524]  [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528]  [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531]  [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534]  [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558]  [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564]  [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567]  [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0
kernel: [26339.017570]  [<ffffffff81425509>] ? net_rx_action+0x129/0x250
kernel: [26339.017573]  [<ffffffff8106d911>] ? __do_softirq+0xf1/0x2d0
kernel: [26339.017575]  [<ffffffff8106dd25>] ? irq_exit+0x95/0xa0
kernel: [26339.017578]  [<ffffffff8151dbe2>] ? do_IRQ+0x52/0xe0
kernel: [26339.017582]  [<ffffffff8151ba2d>] ? common_interrupt+0x6d/0x6d
kernel: [26339.017583]  <EOI>  [<ffffffff8108c31d>] ? __hrtimer_start_range_ns+0x1cd/0x3a0
kernel: [26339.017588]  [<ffffffff813e32a2>] ? cpuidle_enter_state+0x52/0xc0
kernel: [26339.017590]  [<ffffffff813e3298>] ? cpuidle_enter_state+0x48/0xc0
kernel: [26339.017592]  [<ffffffff810a9b28>] ? cpu_startup_entry+0x328/0x470
kernel: [26339.017595]  [<ffffffff81043fdf>] ? start_secondary+0x20f/0x2d0
[....]
kernel: [26339.017647] swapper/13: page allocation failure: order:0, mode:0x20
kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
kernel: [26339.017667]  active_file:284318 inactive_file:151176 isolated_file:0
kernel: [26339.017667]  unevictable:20736 dirty:24804 writeback:4297 unstable:0
kernel: [26339.017667]  free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
kernel: [26339.017667]  mapped:22343 shmem:413 pagetables:10111 bounce:0
kernel: [26339.017667]  free_cma:0
kernel: [26339.017670] Node 0 DMA free:15896kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: [26339.017675] lowmem_reserve[]: 0 3191 16016 16016
kernel: [26339.017680] Node 0 DMA32 free:56312kB min:13456kB low:16820kB high:20184kB active_anon:589468kB inactive_anon:141384kB active_file:1132312kB inactive_file:597576kB unevictable:16616kB isolated(anon):0kB isolated(file):0kB present:3345344kB managed:3270860kB mlocked:16616kB dirty:33860kB writeback:4288kB mapped:18616kB shmem:180kB slab_reclaimable:17036kB slab_unreclaimable:83696kB kernel_stack:34016kB pagetables:8384kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: [26339.017686] lowmem_reserve[]: 0 0 12824 12824
kernel: [26339.017691] Node 0 Normal free:20108kB min:54060kB low:67572kB high:81088kB active_anon:10853680kB inactive_anon:1020528kB active_file:4960kB inactive_file:7128kB unevictable:66328kB isolated(anon):62892kB isolated(file):0kB present:13369344kB managed:13131968kB mlocked:66328kB dirty:65356kB writeback:12900kB mapped:70756kB shmem:1472kB slab_reclaimable:92136kB slab_unreclaimable:262992kB kernel_stack:10880kB pagetables:32060kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4275 all_unreclaimable? no
kernel: [26339.017696] lowmem_reserve[]: 0 0 0 0
kernel: [26339.017701] Node 0 DMA: 0*4kB  0000000000000020 ffff88042f1a3bf0
kernel: [26339.017706] 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15896kB
kernel: [26339.017723] Node 0 DMA32: 250*4kB 
kernel: [26339.017726]  ffffffff81144a3f 0000000000000000 0000000000000000 ffffffff00000002
kernel: [26339.017730] (EM) 967*8kB (UEM) 2628*16kB (UM) 83*32kB (UMR) 15*64kB (R) 8*128kB (R) 4*256kB (R) 0*512kB 0*1024kB 0*2048kB <4>[26339.017747] swapper/0: page allocation failure: order:0, mode:0x20
kernel: [26339.017748] 0*4096kB = 56448kB
kernel: [26339.017751] Node 0 Normal: 3653*4kB (M) 0*8kB 0*16kB 1*32kB (R) 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 17332kB
kernel: [26339.017767] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kernel: [26339.017768] 466495 total pagecache pages
kernel: [26339.017769] 10046 pages in swap cache
kernel: [26339.017771] Swap cache stats: add 4415081, delete 4405035, find 1682225/2488531
kernel: [26339.017772] Free swap  = 19301256kB
kernel: [26339.017773] Total swap = 19764220kB
kernel: [26339.017774] 4182667 pages RAM
kernel: [26339.017775] 0 pages HighMem/MovableOnly
kernel: [26339.017776] 59344 pages reserved
kernel: [26339.017777] 0 pages hwpoisoned
内核:[26339.017497]调用跟踪:
内核:[26339.017499][]?转储_堆栈+0x5d/0x78
内核:[26339.017509][]?警告分配失败+0xdf/0x130
内核:[26339.017513][]__唤醒同步键+0x3d/0x60
内核:[26339.017515][]__alloc_pages_nodemask+0x8ef/0xb50
内核:[26339.017519][]?tcp_v4_do_rcv+0x1af/0x4c0
内核:[26339.017524][]?nf_吊钩_慢速+0x76/0x130
内核:[26339.017528][]?所有页面当前+0x9d/0x150
内核:[26339.017531][]__网络开发分配框架+0x8b/0x140
内核:[26339.017534][]__netdev_alloc_skb+0x6f/0xf0
内核:[26339.017558][]?ixgbe_清洁_接收_irq+0x10d/0xb70[ixgbe]
内核:[26339.017564][]?ixgbe_轮询+0x488/0x860[ixgbe]
内核:[26339.017567][]?hr计时器获取下一个事件+0xad/0xc0
内核:[26339.017570][]?净接收动作+0x129/0x250
内核:[26339.017573][]__do_softirq+0xf1/0x2d0
内核:[26339.017575][]?irq_出口+0x95/0xa0
内核:[26339.017578][]?do_IRQ+0x52/0xe0
内核:[26339.017582][]?公共_中断+0x6d/0x6d
内核:[26339.017583][]__hr计时器启动范围+0x1cd/0x3a0
内核:[26339.017588][]?cpuidle_进入_状态+0x52/0xc0
内核:[26339.017590][]?cpuidle_进入_状态+0x48/0xc0
内核:[26339.017592][]?cpu_启动_项+0x328/0x470
内核:[26339.017595][]?启动辅助+0x20f/0x2d0
[....]
内核:[26339.017647]交换程序/13:页面分配失败:顺序:0,模式:0x20
内核:[26339.017667]活动地址:2860787非活动地址:290478隔离地址:15723
内核:[26339.017667]活动\u文件:284318非活动\u文件:151176隔离\u文件:0
内核:[26339.017667]不可战胜:20736脏:24804写回:4297不稳定:0
内核:[26339.017667]免费:23079板坯可回收:27293板坯不可回收:86672
内核:[26339.017667]映射:22343 shmem:413页表:10111跳出:0
内核:[26339.017667]免费\u cma:0
内核:[26339.017670]节点0 DMA空闲:15896kB最小值:64kB低:80kB高:96kB活动\u非活动\u非活动:0kB活动\u文件:0kB不活动\u文件:0kB不可用:0kB隔离(anon):0kB隔离(文件):0kB当前:15980kB管理:15896kB锁定:0kB脏:0kB写回:0kB映射:0kB shmem:0kB板\u可回收:0kB板\u不可回收:0kB内核\u堆栈:0kB页表:0kB不稳定:0kB反弹:0kB自由\u cma:0kB写回\u tmp:0kB页\u扫描:0所有\u不可回收?对
内核:[26339.017675]低内存保留[]:0 3191 16016
内核:[26339.017680]节点0 DMA32空闲:56312kB最小值:13456kB低值:16820kB高值:20184kB活动\u anon:589468kB非活动\u anon:141384kB活动\u文件:1132312kB非活动\u文件:597576kB不可用:16616kB隔离(anon):0kB隔离(文件):0kB当前:334534KB管理:3270860kB锁定:16616kB脏:33860kB写回:4288kB映射:18616kB shmem:180kB板坯可回收:17036kB板坯不可回收:83696kB内核\u堆栈:34016kB页表:8384kB不稳定:0kB反弹:0kB空闲\u cma:0kB写回\u tmp:0kB页扫描:0所有\u不可回收?不
内核:[26339.017686]低内存保留[]:0 0 12824 12824
内核:[26339.017691]节点0正常可用:20108kB最小值:54060kB低:67572kB高:81088kB活动\u anon:10853680kB非活动\u anon:1020528kB活动\u文件:4960kB非活动\u文件:7128kB不可用:66328kB隔离(anon):62892kB隔离(文件):0kB当前:13369344kB托管:13131968kB锁定:66328kB脏:65356kB写回:12900kB映射:70756kB shmem:1472kB板\可回收:92136kB板\不可回收:262992kB内核\堆栈:10880kB页表:32060kB不稳定:0kB反弹:0kB空闲\ cma:0kB写回\扫描:0kB页\ 4275所有\不可回收?不
内核:[26339.017696]低内存保留[]:0
内核:[26339.017701]节点0 DMA:0*4kB 00000000000000 20 ffff88042f1a3bf0
内核:[26339.017706]1*8kB(U)1*16kB(U)0*32kB 2*64kB(U)1*128kB(U)1*256kB(U)0*512kB 1*1024kB(U)1*2048kB(R)3*4096kB(M)=15896kB
内核:[26339.017723]节点0 DMA32:250*4kB
内核:[26339.017726]FFFFFF81144A3F 0000000000000000 0000000000000000000000 FFFFFF00000002
内核:[26339.017730](EM)967*8kB(UEM)2628*16kB(UM)83*32kB(UMR)15*64kB(R)8*128kB(R)4*256kB(R)0*512kB 0*1024kB 0*2048kB[26339.017747]交换程序/0:页面分配失败:顺序:0,模式:0x20
内核:[26339.017748]0*4096kB=56448kB
内核:[26339.017751]节点0正常:3653*4kB(M)0*8kB 0*16kB 1*32kB(R)0*64kB 1*128kB(R)0*256kB 1*512kB(R)0*1024kB 1*2048kB(R)0*4096kB=17332kB
内核:[26339.017767]节点0个hugepages\u总计=0个hugepages\u自由=0个hugepages\u剩余=0个hugepages\u大小=2048kB
内核:[26339.017768]466495页缓存页总数
内核:[26339.017769]交换缓存中的10046页
内核:[26339.017771]交换缓存统计:添加4415081,删除4405035,查找1682225/2488531
内核:[26339.017772]自由交换=19301256kB
内核:[26339.017773]交换总量=19764220kB
内核:[26339.017774]4182667页RAM
内核:[26339.017775]仅0页高mem/MovableOnly
内核:[26339.017776]保留59344页
内核:[26339.017777]0页已中毒
内核死机显示与irq和ixgbe相关的消息

有人能给我一些建议来解决这个问题吗?服务器在2小时内运行正常,网络负载相同,没有任何问题


关于,调用跟踪无法显示任何与内核死机相关的调试信息

kernel: [26339.017509]  [<ffffffff81144a3f>] ? warn_alloc_failed+0xdf/0x130
kernel: [26339.017513]  [<ffffffff810a949d>] ? __wake_up_sync_key+0x3d/0x60
kernel: [26339.017515]  [<ffffffff81148daf>] ? __alloc_pages_nodemask+0x8ef/0xb50
kernel: [26339.017519]  [<ffffffff8147eaff>] ? tcp_v4_do_rcv+0x1af/0x4c0
kernel: [26339.017524]  [<ffffffff81455b66>] ? nf_hook_slow+0x76/0x130
kernel: [26339.017528]  [<ffffffff811883ad>] ? alloc_pages_current+0x9d/0x150
kernel: [26339.017531]  [<ffffffff81412d7b>] ? __netdev_alloc_frag+0x8b/0x140
kernel: [26339.017534]  [<ffffffff8141913f>] ? __netdev_alloc_skb+0x6f/0xf0
kernel: [26339.017558]  [<ffffffffa0146a0d>] ? ixgbe_clean_rx_irq+0x10d/0xb70 [ixgbe]
kernel: [26339.017564]  [<ffffffffa0148198>] ? ixgbe_poll+0x488/0x860 [ixgbe]
kernel: [26339.017567]  [<ffffffff8108c9ad>] ? hrtimer_get_next_event+0xad/0xc0
正如“inactive_anon:290478,inactive_file:151176”签名所示,DMA区域页面饥饿的可能性很高。 如果您参考以下指令,您将了解我们的系统是否正在经历内核内存泄漏

  • 内核:添加与kmem泄漏相关的配置
  • diff——git a/arch/arm/configs/pompeii_defconfig b/arch/arm/configs/pompeii_defconfig 索引2e97f97..aac678a 100644 ---a/arch/arm/configs/庞贝城defconfig +++b/拱/臂/c
    kernel: [26339.017667] active_anon:2860787 inactive_anon:290478 isolated_anon:15723
    kernel: [26339.017667]  active_file:284318 inactive_file:151176 isolated_file:0
    kernel: [26339.017667]  unevictable:20736 dirty:24804 writeback:4297 unstable:0
    kernel: [26339.017667]  free:23079 slab_reclaimable:27293 slab_unreclaimable:86672
    kernel: [26339.017667]  mapped:22343 shmem:413 pagetables:10111 bounce:0
    kernel: [26339.017667]  free_cma:0