Gcc 卡洛克崩溃

Gcc 卡洛克崩溃,gcc,crash,openmp,calloc,Gcc,Crash,Openmp,Calloc,我正在调试我写的一个程序。我在gdb内部运行它,并设法从calloc()内部捕获一个SIGABRT。我完全不知道这是怎么发生的。它可能是gcc中的bug,甚至是libc中的bug吗 更多详细信息:我的程序使用OpenMP。我在单线程模式下运行了valgrind,没有任何错误。我还使用mmap()加载40GB文件,但我怀疑这是否相关。在gdb内部,我运行了30个线程。几次相同的运行(相同的输入&CL)正确完成,直到我发现有问题的运行。从表面上看,这表明可能存在某种类型的种族状况。但是,SIGABR

我正在调试我写的一个程序。我在
gdb
内部运行它,并设法从
calloc()内部捕获一个
SIGABRT
。我完全不知道这是怎么发生的。它可能是
gcc
中的bug,甚至是
libc
中的bug吗

更多详细信息:我的程序使用OpenMP。我在单线程模式下运行了
valgrind
,没有任何错误。我还使用
mmap()
加载40GB文件,但我怀疑这是否相关。在gdb内部,我运行了30个线程。几次相同的运行(相同的输入&CL)正确完成,直到我发现有问题的运行。从表面上看,这表明可能存在某种类型的种族状况。但是,
SIGABRT
来自我无法控制的
calloc()。以下是一些相关的
gdb
输出:

(gdb) info threads
[...]
  17 Thread 0x7fffec450700 (LWP 73455)  0x00007ffff6b3ce00 in __read_nocancel () from /lib64/libc.so.6
  16 Thread 0x7fffece51700 (LWP 73454)  _mm_slli_si128 (genome=<value optimized out>, goff=<value optimized out>, glen=50, read=<value optimized out>, rlen=36, genome_ls=<value optimized out>, initbp=-1, is_rna=false) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/emmintrin.h:1155
  14 Thread 0x7fffee253700 (LWP 73452)  0x000000000041183e in _mm_prefetch (re=0x7fff291c0f10, st=1, options=0x632014) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/xmmintrin.h:1193
  13 Thread 0x7fffeec54700 (LWP 73451)  0x00007ffff6ae5c84 in __memset_sse2 () from /lib64/libc.so.6
  12 Thread 0x7fffef655700 (LWP 73450)  _mm_cmpeq_epi16 (genome=<value optimized out>, goff=<value optimized out>, glen=33, read=<value optimized out>, rlen=24, genome_ls=<value optimized out>, initbp=-1, is_rna=false) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/emmintrin.h:1263
* 11 Thread 0x7ffff0056700 (LWP 73449)  0x00007ffff6a948a5 in raise () from /lib64/libc.so.6
  10 Thread 0x7ffff0a57700 (LWP 73448)  _mm_sub_epi16 (genome=<value optimized out>, goff=<value optimized out>, glen=36, read=<value optimized out>, rlen=26, genome_ls=<value optimized out>, initbp=-1, is_rna=false) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/emmintrin.h:1046
  5 Thread 0x7ffff3c5c700 (LWP 73443)  0x000000000041183e in _mm_prefetch (re=0x7fff28615010, st=1, options=0x632014) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/xmmintrin.h:1193
  2 Thread 0x7ffff5a5f700 (LWP 73440)  0x000000000041e4cd in _mm_max_epi16 (genome=<value optimized out>, goff=<value optimized out>, glen=29, read=<value optimized out>, rlen=21, genome_ls=<value optimized out>, initbp=-1, is_rna=false) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/emmintrin.h:1331
  1 Thread 0x7ffff7fdcae0 (LWP 73437)  0x00007ffff6ae5cff in __memset_sse2 () from /lib64/libc.so.6
[...]
(gdb) thread 11
[Switching to thread 11 (Thread 0x7ffff0056700 (LWP 73449))]#0  0x00007ffff6a948a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6a948a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff6a96085 in abort () from /lib64/libc.so.6
#2  0x00007ffff6ad1fe7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff6ad7916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff6adb79f in _int_malloc () from /lib64/libc.so.6
#5  0x00007ffff6adbdd6 in calloc () from /lib64/libc.so.6
#6  0x000000000040e87f in my_calloc (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/../gmapper/../common/my-alloc.h:286
#7  read_get_hit_list_per_strand (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/mapping.c:1046
#8  0x000000000041308a in read_get_hit_list (re=<value optimized out>, options=0x632010, n_options=1) at gmapper/mapping.c:1239
#9  handle_read (re=<value optimized out>, options=0x632010, n_options=1) at gmapper/mapping.c:1806
#10 0x0000000000404f35 in launch_scan_threads (.omp_data_i=<value optimized out>) at gmapper/gmapper.c:557
#11 0x00007ffff7230502 in ?? () from /usr/lib64/libgomp.so.1
#12 0x00007ffff6dfc851 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff6b4a11d in clone () from /lib64/libc.so.6
(gdb) f 6   
#6  0x000000000040e87f in my_calloc (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/../gmapper/../common/my-alloc.h:286
286         res = calloc(size, 1);
(gdb) p size
$2 = 814080
(gdb) 
该程序没有内存不足,在256GB可用的计算机上使用的是41GB:

$ top -b -n 1 | grep gmapper
 73437 user      20   0 41.5g  16g  15g T  0.0  6.6  55:17.24 gmapper-ls
$ free -m
             total       used       free     shared    buffers     cached
Mem:        258437     195567      62869          0         82     189677
-/+ buffers/cache:       5807     252629
Swap:            0          0          0
我使用
g++(GCC)4.4.6 20120305(Red Hat 4.4.6-4)
编译,带有标志
-g-O2-DNDEBUG-mmmx-msse-msse2-fopenmp-Wall-Wno不推荐使用-D_STDC_格式_宏-D_STDC_限制_宏

编辑:下面是我收到的详细错误消息。我截断了它,没有显示的行与前5行相似

*** glibc detected *** /tmp/t/gmapper-ls: corrupted double-linked list: 0x0000000009447380 ***
*** glibc detected *** /tmp/t/gmapper-ls: corrupted double-linked list: 0x0000000009447380 ***
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6(+0x75916)[0x7ffff6ad7916]
/lib64/libc.so.6(+0x75916)[0x7ffff6ad7916]
/lib64/libc.so.6(+0x7979f)[0x7ffff6adb79f]
/lib64/libc.so.6(+0x7979f)[0x7ffff6adb79f]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7ffff6adc141]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x7ffff6adbdd6]
/usr/lib64/libgomp.so.1(+0x8502)[0x7ffff7230502]
/usr/lib64/libgomp.so.1(+0x8502)[0x7ffff7230502]
/lib64/libc.so.6(clone+0x6d)[0x7ffff6b4a11d]
/lib64/libpthread.so.0(+0x7851)[0x7ffff6dfc851]
/lib64/libc.so.6(clone+0x6d)[0x7ffff6b4a11d]
======= Memory map: ========
00400000-00430000 r-xp 00000000 00:14 927268870                          /tmp/t/gmapper
00630000-00631000 rw-p 00030000 00:14 927268870                          /tmp/t/gmapper
00631000-2f461000 rw-p 00000000 00:00 0                                  [heap]
7f749d9be000-7f7e2053b000 r--p 00000000 00:0f 1278724                    /dev/shm/hg19-ls
7fff24000000-7fff2727a000 rw-p 00000000 00:00 0
7fff2727a000-7fff28000000 ---p 00000000 00:00 0
7fff285ce000-7fff2c000000 rw-p 00000000 00:00 0
7fff2c000000-7fff2f547000 rw-p 00000000 00:00 0
7fff2f547000-7fff30000000 ---p 00000000 00:00 0
[...]

Edit2:我为所有停在“非平凡”位置(10/30)的线程添加了线程信息。

注意,您的GCC版本非常旧。当前通用条款为4.7.2

出于调试目的,您可能会使用较少的优化进行编译:不用使用
-g-O2-NDEBUG-mmmx-msse-msse2-fopenmp-Wall
,只需使用
-g-Wall

使用较少的优化进行编译会使
gdb
调试器更加愉快


然后,使用和
gdb
调试您的程序。

请也包括您的源代码!你从glibc那里得到了什么错误信息?@BobKaufman:源代码在。然而,它是相当大的,我真的不希望其他人阅读它的全部。如果你有更具体的问题,我可以试着回答。@hristoilev:我在
gdb
提示符前添加了错误消息。这是您需要从glibc下载的吗?我建议您使用“英特尔线程检测器”(如果可用)(我认为您可以下载它并获得试用许可证)或Sun Thread Analyzer(Oracle Solaris Studio 12 for Linux的一部分,仍然免费)运行代码。在没有优化的情况下编译,并使用这两种工具的调试信息(警告:更改opt.level可能会更改程序行为,错误可能会消失)。在我看来似乎是数据竞赛。该节点正在运行CentOS 6.3,据我所知是最新的(不是管理员)。因此,较旧的
gcc
。所以你真的认为这是一个
gcc
问题?对于
valgrind
而言,我使用相同的标志进行编译,减去
-O2-NDEBUG
。然而,几次运行都完成得很好,这一事实表明这与线程有关,而单线程
valgrind
无法捕获线程。我还没有试过移除任何其他标志,下一步我会这么做。
*** glibc detected *** /tmp/t/gmapper-ls: corrupted double-linked list: 0x0000000009447380 ***
*** glibc detected *** /tmp/t/gmapper-ls: corrupted double-linked list: 0x0000000009447380 ***
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6(+0x75916)[0x7ffff6ad7916]
/lib64/libc.so.6(+0x75916)[0x7ffff6ad7916]
/lib64/libc.so.6(+0x7979f)[0x7ffff6adb79f]
/lib64/libc.so.6(+0x7979f)[0x7ffff6adb79f]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7ffff6adc141]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x7ffff6adbdd6]
/usr/lib64/libgomp.so.1(+0x8502)[0x7ffff7230502]
/usr/lib64/libgomp.so.1(+0x8502)[0x7ffff7230502]
/lib64/libc.so.6(clone+0x6d)[0x7ffff6b4a11d]
/lib64/libpthread.so.0(+0x7851)[0x7ffff6dfc851]
/lib64/libc.so.6(clone+0x6d)[0x7ffff6b4a11d]
======= Memory map: ========
00400000-00430000 r-xp 00000000 00:14 927268870                          /tmp/t/gmapper
00630000-00631000 rw-p 00030000 00:14 927268870                          /tmp/t/gmapper
00631000-2f461000 rw-p 00000000 00:00 0                                  [heap]
7f749d9be000-7f7e2053b000 r--p 00000000 00:0f 1278724                    /dev/shm/hg19-ls
7fff24000000-7fff2727a000 rw-p 00000000 00:00 0
7fff2727a000-7fff28000000 ---p 00000000 00:00 0
7fff285ce000-7fff2c000000 rw-p 00000000 00:00 0
7fff2c000000-7fff2f547000 rw-p 00000000 00:00 0
7fff2f547000-7fff30000000 ---p 00000000 00:00 0
[...]