C 内联汇编后指针解引用（SIGSEGV）失败_C_Gcc_X86_Inline Assembly

C 内联汇编后指针解引用（SIGSEGV）失败

c gcc x86

C 内联汇编后指针解引用（SIGSEGV）失败,c,gcc,x86,inline-assembly,C,Gcc,X86,Inline Assembly,在尝试用不同的方法对Schoenhage基数转换树的叶子进行编码时，我遇到了一个问题，编译器（GCC，clang）优化了一个小常数的除法和倒数的乘法。他们应该不会抱怨。所以我决定添加一个小的内联程序集来获得可比较的基准测试，但我得到的是SEGFULTS 代码（不是最简单的示例，但某些上下文可能会有所帮助）但正如上面所提到的：它是错误的。我将代码稍微分开，以便每行执行一个操作，然后运行valgrind--leak check=full--show leak kinds=all--track or

在尝试用不同的方法对Schoenhage基数转换树的叶子进行编码时，我遇到了一个问题，编译器（GCC，clang）优化了一个小常数的除法和倒数的乘法。他们应该不会抱怨。所以我决定添加一个小的内联程序集来获得可比较的基准测试，但我得到的是SEGFULTS

代码（不是最简单的示例，但某些上下文可能会有所帮助）

但正如上面所提到的：它是错误的。我将代码稍微分开，以便每行执行一个操作，然后运行

valgrind--leak check=full--show leak kinds=all--track origins=yes./divmod

，这是打印的

==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546== 
==9546== Invalid read of size 4
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546== 
==9546== 
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546==  Access not within mapped region at address 0x2
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  If you believe this happened as a result of a stack
==9546==  overflow in your program's main thread (unlikely but
==9546==  possible), you can try to increase the size of the
==9546==  main thread stack using the --main-stacksize= flag.
==9546==  The main thread stack size used in this run was 8388608.
==9546== 
==9546== HEAP SUMMARY:
==9546==     in use at exit: 0 bytes in 0 blocks
==9546==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546== 
==9546== All heap blocks were freed -- no leaks are possible

地址

0x2

非常低，提示取消引用失败，事实上是这样

对

printf

的两个调用被保留下来，原因可能是您已经猜到了：如果您使用一个（我昨天需要同时使用两个），它就可以工作。这几乎总是由某些UB（未定义的行为）引起的

有点担心我犯了一个非常愚蠢的错误，会被你们所有人嘲笑：这是什么原因？如何修复它？

问题是，在内联程序集中，您会：

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
          );

GCC/叮当声是非常不可原谅的。如果修改寄存器，则需要告诉编译器它将被修改。在这段内联汇编代码中，您已经说过EAX和EDX是仅输出的寄存器（它们将被修改），但您没有告诉编译器您修改/删除了EBX。一个简单的修复方法是将EBX添加到clobber列表中，如下所示：

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
           : "ebx"
          );

   __asm__("divl %4"
           : "=a"(q), "=d"(r)
           : "a"(a), "d"(0), "r"(b)
          );

现在，编译器不会假设EBX仍然包含与内联汇编代码运行前相同的值

如果内联程序集以

MOV

指令开始，则可能采取了错误的方法，没有使用内联程序集操作数（和约束）本身来允许编译器尝试生成最有效的代码版本。您的内联程序集可能如下所示：

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
           : "ebx"
          );

   __asm__("divl %4"
           : "=a"(q), "=d"(r)
           : "a"(a), "d"(0), "r"(b)
          );

我们创建一个第五个操作数来传递编译器选择的寄存器中的除数。我们还将操作数中的EDX设置为零，而不是在内联程序集中设置。此版本还为输入和输出操作数重用EAX和EDX寄存器，可能需要使用更少的寄存器。

我不确定使用内联汇编时您的代码会更快。@Jabberwocky不会，但我使用汇编只是为了获得类似的基准测试。将“乘法和倒数”与“divl”进行比较有点像是将苹果和橙子进行比较。这是一个多么干净整洁的问题，确实让我脸上露出了笑容：）但是我会（意见提醒！）劝阻您不要在一个源文件中混合两种语言，就像我劝阻

goto

s一样-除非spidey感官说这是唯一的选择，否则不要这样做：）如果您的目标是防止分区被优化，那么您可以通过将红利放入

volatile

变量中来获得类似的结果，这样编译器就不会假设它是常量。不过，这可能会导致一些额外的加载和存储。@NateEldredge:或者使用

asm（“：“+r”（var））

从优化器中“清洗”它的值，使编译器在寄存器中具体化

var

，并忘记它知道的任何信息（例如值、非负等）这基本上就是一些Benchmark:：DoNotOptimize包装器所做的。它之所以能工作，是因为您告诉编译器var的值是asm语句的输出，而它不会检查asm语句来试图弄清楚它是做什么的，即使它是空的。是的，我知道这会令人尴尬！；-）谢谢@deamentiaemundi：一点也不尴尬。不幸的是，GCC内联汇编语法的无情性质可能会导致出现类似这样的微妙问题。然而，这种无情的特性也使它更加强大，因为它可以让编译器充分利用其优化功能。