Linux kernel 内核程序RDMA(krping)

Linux kernel 内核程序RDMA(krping),linux-kernel,infiniband,rdma,mellanox,Linux Kernel,Infiniband,Rdma,Mellanox,我使用一个内核模块在Infiniband上的内核空间中进行RDMA传输(krping.c,链接:git.openfabrics.org git-~sgrimberg/krping.git/summary)。我拥有的卡有MellanoxConnectX-4(驱动程序:mlx5)、Linux内核版本:3.13、Ubuntu 12.04和MellanoxOfed3.3 代码似乎被困在获取mlx5_ib_query_qp函数(在mlx5_ib.h中)中的mutex_lock中。它通过ib_req\u n

我使用一个内核模块在Infiniband上的内核空间中进行RDMA传输(
krping.c
,链接:git.openfabrics.org git-~sgrimberg/krping.git/summary)。我拥有的卡有MellanoxConnectX-4(驱动程序:mlx5)、Linux内核版本:3.13、Ubuntu 12.04和MellanoxOfed3.3

代码似乎被困在获取
mlx5_ib_query_qp
函数(在mlx5_ib.h中)中的
mutex_lock
中。它通过
ib_req\u notify_cq函数从
krping.c
模块调用。关于如何解决此错误/死锁,我可以得到一些帮助吗?我把dmesg的追踪和这篇文章联系在一起

Dmesg跟踪:

[  499.178862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  499.178951] IP: [<ffffffff8176c451>] __mutex_lock_slowpath+0xf1/0x1b0
[  499.179024] PGD 7dadd8067 PUD 7be174067 PMD 0
[  499.179079] Oops: 0002 [#1] SMP
[  499.179118] Modules linked in: rdma_krping(OX) target_core_mod ib_iser(OX) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep i915 snd_pcm rfcomm bnep mei_me snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device snd_timer bluetooth drm_kms_helper psmouse snd mei drm mac_hid soundcore snd_page_alloc i2c_algo_bit shpchp serio_raw dcdbas lpc_ich video knem(OX) parport_pc ppdev lp parport rdma_ucm(OX) ib_ucm(OX) rdma_cm(OX) iw_cm(OX) configfs ib_ipoib(OX) ib_cm(OX) ib_uverbs(OX) ib_umad(OX) mlx5_ib(OX) mlx5_core(OX) mlx4_en(OX) vxlan ip_tunnel mlx4_ib(OX) ib_sa(OX) ib_mad(OX) ib_core(OX) nls_iso8859_1 ib_addr(OX) ib_netlink(OX) mlx4_core(OX) mlx_compat(OX) hid_generic usbhid hid e1000e ptp ahci pps_core libahci
[  499.180007] CPU: 0 PID: 2618 Comm: bash Tainted: G           OX 3.13.0-91-generic #138~precise1-Ubuntu
[  499.180091] Hardware name: Dell Inc. OptiPlex 9020/0N4YC8, BIOS A11 04/01/2015
[  499.180159] task: ffff8807c6d8e000 ti: ffff8807c0daa000 task.ti: ffff8807c0daa000
[  499.180228] RIP: 0010:[<ffffffff8176c451>]  [<ffffffff8176c451>] __mutex_lock_slowpath+0xf1/0x1b0
[  499.180315] RSP: 0018:ffff8807c0dabaf8  EFLAGS: 00010286
[  499.180366] RAX: 0000000000000020 RBX: ffff8807ee0c4898 RCX: ffff8807c0dabaf0
[  499.180431] RDX: ffff8807c0dabb00 RSI: ffff8807c0dabb18 RDI: ffff8807ee0c489c
[  499.180496] RBP: ffff8807c0dabb58 R08: 0000000000000000 R09: 0000000000000000
[  499.180561] R10: 00000000000004bd R11: 00000000000004bc R12: ffff8807ee0c489c
[  499.180626] R13: 00000000ffffffff R14: ffff8807c6d8e000 R15: ffff8807ee0c48a0
[  499.180692] FS:  00007ff11ca66700(0000) GS:ffff88081ea00000(0000) knlGS:0000000000000000
[  499.180766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  499.180819] CR2: 0000000000000020 CR3: 00000007c0d98000 CR4: 00000000001407f0

[  499.180884] Stack:
[  499.180905]  ffff8807c0dabaf0 ffff8807ee0c48a0 0000000000000020 0000000000020002
[  499.180986]  0000000000000000 ffff880700000001 ffff880700000000 ffff8807ee0c4898
[  499.181067]  0000000000000002 0000000000000006 ffff8807edd6636a ffff8800d46f0000

[  499.181146] Call Trace:
[  499.181178]  [<ffffffff8176c533>] mutex_lock+0x23/0x37
[  499.181242]  [<ffffffffa024bd11>] mlx5_ib_query_qp+0x41/0x660 [mlx5_ib]
[  499.181309]  [<ffffffff817552b6>] ? printk+0x61/0x63
[  499.181361]  [<ffffffffa0699543>] krping_setup_qp.isra.8+0x115/0x25f [rdma_krping]
[  499.181435]  [<ffffffffa069a518>] krping_run_client+0x56/0x757 [rdma_krping]
[  499.181508]  [<ffffffff81387f7e>] ? memzero_explicit+0xe/0x10
[  499.181567]  [<ffffffff8149eca7>] ? extract_entropy+0xc7/0x180
[  499.181626]  [<ffffffff8149f0b7>] ? get_random_bytes+0x47/0xd0
[  499.181684]  [<ffffffff810c457a>] ? console_unlock+0x1a/0x30
[  499.181741]  [<ffffffffa06961a0>] ? krping_getopt+0x1a0/0x1a0 [rdma_krping]
[  499.181815]  [<ffffffffa02b7f56>] ? rdma_create_id+0x136/0x150 [rdma_cm]
[  499.181881]  [<ffffffff8149f0b7>] ? get_random_bytes+0x47/0xd0
[  499.181939]  [<ffffffffa06961a0>] ? krping_getopt+0x1a0/0x1a0 [rdma_krping]
[  499.182007]  [<ffffffffa0696ff7>] krping_doit+0x5f7/0x9e0 [rdma_krping]
[  499.182077]  [<ffffffff811b250a>] ? __kmalloc+0x5a/0x250
[  499.182130]  [<ffffffffa0697434>] ? krping_write_proc+0x54/0xf4 [rdma_krping]
[  499.182199]  [<ffffffffa0697493>] krping_write_proc+0xb3/0xf4 [rdma_krping]
[  499.182270]  [<ffffffff81234d93>] proc_reg_write+0x43/0x70
[  499.184645]  [<ffffffff811cdff5>] vfs_write+0xc5/0x1f0
[  499.186973]  [<ffffffff811ce4f2>] SyS_write+0x52/0xa0
[  499.189256]  [<ffffffff81776fdd>] system_call_fastpath+0x1a/0x1f

[  499.191569] Code: 85 c0 78 09 31 c0 87 03 83 f8 01 74 74 48 8b 43 10 48 8d 55 a8 4c 8d 7b 08 41 bd ff ff ff ff 48 89 53 10 4c 89 7d a8 48 89 45 b0 <48> 89 10 4c 89 75 b8 eb 22 66 0f 1f 44 00 00 4c 89 e7 49 c7 06
[  499.196527] RIP  [<ffffffff8176c451>] __mutex_lock_slowpath+0xf1/0x1b0
[  499.198850]  RSP <ffff8807c0dabaf8>
[  499.201122] CR2: 0000000000000020
[499.178862]错误:无法处理00000000000000处的内核空指针解引用20
[499.178951]IP:[]互斥锁慢路径+0xf1/0x1b0
[499.179024]PGD 7dadd8067 PUD 7be174067 PMD 0
[499.179079]Oops:0002[#1]SMP
[499.179118]链接到的模块:rdma_krping(OX)target_core_mod ib_iser(OX)iscsi\u tcp libiscsi\u tcp libiscsi scsi\u传输\u iscsi snd\u hda\u编解码器\u realtek snd\u hda\u hdmi snd\u hda\u英特尔snd\u hda\u编解码器snd\u hwdep i915 snd\u pcm rfcomm bnep mei snd\u SEQU midi snd\u rawmidi snd\u SEQUE midi事件snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE snd\QUE设备snd\QUE定时器M\QUE kms\QUOTE SMS\QUOTE\QUOTE hid\QUOTE\QUOTE\QUOTE\QUOTE\QUOTE\QUOTE\QUOTE\QUOTE\(牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛)牛(牛)是厘米(OX)厘米(OX)厘米(OX)厘米(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(牛)图(牛)图(牛)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图(OX)图X_同胞(牛)hid_通用usbhid hid e1000e ptp ahci pps_核心libahci
[499.180007]CPU:0 PID:2618通信:bash受污染:G OX 3.13.0-91-generic#138~precise1 Ubuntu
[499.180091]硬件名称:戴尔公司OptiPlex 9020/0N4YC8,BIOS A11 04/01/2015
[499.180159]任务:ffff8807c6d8e000 ti:ffff8807c0daa000任务:ffff8807c0daa000
[499.180228]RIP:0010:[]互斥锁慢路径+0xf1/0x1b0
[499.180315]RSP:0018:ffff8807c0dabaf8 EFLAGS:00010286
[499.180366]RAX:00000000000000 20 RBX:ffff8807ee0c4898 RCX:ffff8807c0dabaf0
[499.180431]RDX:ffff8807c0dabb00 RSI:ffff8807c0dabb18 RDI:ffff8807ee0c489c
[499.180496]RBP:ffff8807c0dabb58 R08:0000000000000000 R09:0000000000000000
[499.180561]R10:00000000000004bd R11:00000000000004bc R12:ffff8807ee0c489c
[499.180626]R13:00000000 FFFFFFFFR14:ffff8807c6d8e000 R15:ffff8807ee0c48a0
[499.180692]FS:00007ff11ca66700(0000)GS:FFFF88081A00000(0000)KNLG:0000000000000000000000
[499.180766]CS:0010 DS:0000 ES:0000 CR0:00000000 80050033
[499.180819]CR2:00000000000000 20 CR3:0000000 7C0D98000 CR4:0000000000 1407F0
[499.180884]堆栈:
[499.180905]ffff8807c0dabaf0 FFFF8807EE0C48A0000000000000000000200002
[499.180986]0000000000000000 FFFF88070000001 FFFF880700000 ffff8807ee0c4898
[499.181067]000000000000000 2 000000000000000 6 ffff8807edd6636a ffff8800d46f0000
[499.181146]呼叫跟踪:
[499.181178][]互斥锁+0x23/0x37
[499.181242][]mlx5_ib_查询_qp+0x41/0x660[mlx5_ib]
[  499.181309]  [] ? printk+0x61/0x63
[499.181361][]krping\u设置\u qp.isra.8+0x115/0x25f[rdma\u krping]
[499.181435][]krping\u run\u客户端+0x56/0x757[rdma\u krping]
[  499.181508]  [] ? memzero_显式+0xe/0x10
[  499.181567]  [] ? 提取熵+0xc7/0x180
[  499.181626]  [] ? 获取\u随机\u字节+0x47/0xd0
[  499.181684]  [] ? 控制台解锁+0x1a/0x30
[  499.181741]  [] ? krping_getopt+0x1a0/0x1a0[rdma_krping]
[  499.181815]  [] ? rdma_创建_id+0x136/0x150[rdma_cm]
[  499.181881]  [] ? 获取\u随机\u字节+0x47/0xd0
[  499.181939]  [] ? krping_getopt+0x1a0/0x1a0[rdma_krping]
[499.182007][]krping_-doit+0x5f7/0x9e0[rdma_-krping]
[  499.182077]  [] ? __kmalloc+0x5a/0x250
[  499.182130]  [] ? krping_write_proc+0x54/0xf4[rdma_krping]
[499.182199][]krping\u write\u proc+0xb3/0xf4[rdma\u krping]
[499.182270][]过程寄存器写入+0x43/0x70
[499.184645][]vfs_写入+0xc5/0x1f0
[499.186973][]系统写入+0x52/0xa0
[499.189256][]系统调用快速路径+0x1a/0x1f
[499.191569]代码:85 c0 78 09 31 c0 87 03 83 f8 01 74 74 48 8b 43 10 48 8d 55 a8 4c 8d 7b 08 41 bd ff ff ff ff ff ff ff 48 89 53 10 4c 89 7d a8 48 89 45 b0 89 10 4c 89 75 b8 eb 22 66 0f 1f 44 00 4c 89 e7 49 c7 06
[499.196527]RIP[]互斥锁慢路径+0xf1/0x1b0
[499.198850]RSP
[499.201122]CR2:00000000000000 20

我从未使用过krping,因此帮不了你多少忙。但是您可以在这里查看一些RDMA内核工作代码:谢谢您的回复。我无法在mlnx-ofed-kernel-3.3上编译这些模块,并且我假设没有针对mlnx头编译模块。您使用的是哪一个Makefile?抱歉,当时正在出差,没有时间检查。如果你无法取得进展,请告诉我。我从未使用过krping,因此帮不了你多少忙。但是您可以在这里查看一些RDMA内核工作代码:谢谢您的回复。我无法在mlnx-ofed-kernel-3.3上编译这些模块,并且我假设没有针对mlnx头编译模块。您使用的是哪一个Makefile?抱歉,当时正在出差,没有时间检查。如果你不能取得进展,请告诉我。