C 了解Linux性能报告输出_C_Linux_Perf

C 了解Linux性能报告输出

c linux

C 了解Linux性能报告输出,c,linux,perf,C,Linux,Perf,虽然我可以直观地得到大部分结果，但我很难完全理解perf report命令的输出，尤其是与调用图有关的内容，因此我编写了一个愚蠢的测试来彻底解决我的这个问题愚蠢的测试我编写了以下内容： gcc -Wall -pedantic -lm perf-test.c -o perf-test 没有积极的优化，以避免内联等 #include <math.h> #define N 10000000UL #define USELESSNESS(n)

虽然我可以直观地得到大部分结果，但我很难完全理解

perf report

命令的输出，尤其是与调用图有关的内容，因此我编写了一个愚蠢的测试来彻底解决我的这个问题

愚蠢的测试我编写了以下内容：

gcc -Wall -pedantic -lm perf-test.c -o perf-test

没有积极的优化，以避免内联等

#include <math.h>

#define N 10000000UL

#define USELESSNESS(n)                          \
    do {                                        \
        unsigned long i;                        \
        double x = 42;                          \
        for (i = 0; i < (n); i++) x = sin(x);   \
    } while (0)

void baz()
{
    USELESSNESS(N);
}

void bar()
{
    USELESSNESS(2 * N);
    baz();
}

void foo()
{
    USELESSNESS(3 * N);
    bar();
    baz();
}

int main()
{
    foo();
    return 0;
}

有了这些，我得到了：

  94,44%  perf-test  libm-2.19.so       [.] __sin_sse2
   2,09%  perf-test  perf-test          [.] sin@plt
   1,24%  perf-test  perf-test          [.] foo
   0,85%  perf-test  perf-test          [.] baz
   0,83%  perf-test  perf-test          [.] bar

这听起来很合理，因为繁重的工作实际上是由

和执行的sin@plt
可能只是一个包装器，而我的函数的开销只考虑了循环，总的来说：foo
的3*N
迭代，其他两个的2*N

层次分析
现在，我得到的开销列有两个：Children
（默认情况下，输出按此列排序）和Self
（与平面配置文件的开销相同）
在这里，我开始觉得我错过了一些东西：不管我是否使用了-G
，我都无法解释“x调用y”或“y被x调用”的层次结构，例如：

没有-G
（“y被x调用”）：
为什么\uu sin_sse2
被main
（间接？）、foo
和bar
调用，而不是被baz
调用
为什么函数有时会附加百分比和层次结构（例如，baz
的最后一个实例），而有时不会附加百分比和层次结构（例如，bar
的最后一个实例）

使用-G
（“x调用y”）：
我应该如何解释下的前三个条目

main
调用foo
没关系，但是如果它调用\uu sin\u sse2
和sin@plt
（间接地？）它也不调用bar
和baz
为什么\u libc\u start\u main
和main
出现在foo
下？为什么foo
出现两次


怀疑的是，这个层次结构有两个层次，第二个层次实际上表示“x调用y”/“y被x调用”语义，但我已经猜累了，所以我在这里提问。而这些文档似乎也帮不上忙

很抱歉，这篇文章太长了，但我希望所有这些上下文也能对其他人有所帮助或起到参考作用。
好吧，让我们暂时忽略呼叫者和被呼叫者调用图之间的差异，主要是因为当我在我的机器上比较这两个选项之间的结果时，我只看到kernel.kallsyms
DSO内部的效果，原因我不明白——我自己对此还比较陌生
我发现对于你的例子来说，阅读整棵树更容易一些。因此，使用--stdio
，让我们看看\uu sin\uSSE2
的整个树：
# Overhead    Command      Shared Object                  Symbol
# ........  .........  .................  ......................
#
    94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
            |
            --- __sin_sse2
               |
               |--44.20%-- foo
               |          |
               |           --100.00%-- main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
               |--27.95%-- baz
               |          |
               |          |--51.78%-- bar
               |          |          foo
               |          |          main
               |          |          __libc_start_main
               |          |          _start
               |          |          0x0
               |          |
               |           --48.22%-- foo
               |                     main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
                --27.84%-- bar
                          |
                           --100.00%-- foo
                                     main
                                     __libc_start_main
                                     _start
                                     0x0

因此，我的阅读方式是：44%的时间，sin
是从foo
调用的；27%的时间是从baz
调用的，27%的时间是从bar调用的
-g的文档具有指导意义：
 -g [type,min[,limit],order[,key]], --call-graph
       Display call chains using type, min percent threshold, optional print limit and order. type can be either:

       ·   flat: single column, linear exposure of call chains.

       ·   graph: use a graph tree, displaying absolute overhead rates.

       ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

               order can be either:
               - callee: callee based call graph.
               - caller: inverted caller based call graph.

               key can be:
               - function: compare on functions
               - address: compare on individual code addresses

               Default: fractal,0.5,callee,function.

这里重要的一点是默认值是分形，在分形模式下，每个分支都是一个新对象
因此，您可以看到，调用baz
的时间有50%是从bar
调用的，另外50%是从foo
调用的
这并不总是最有用的度量，因此使用-g graph
查看结果很有意义：
94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
        |
        --- __sin_sse2
           |
           |--41.87%-- foo
           |          |
           |           --41.48%-- main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
           |--26.48%-- baz
           |          |
           |          |--13.50%-- bar
           |          |          foo
           |          |          main
           |          |          __libc_start_main
           |          |          _start
           |          |          0x0
           |          |
           |           --12.57%-- foo
           |                     main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
            --26.38%-- bar
                      |
                       --26.17%-- foo
                                 main
                                 __libc_start_main
                                 _start
                                 0x0

这将更改为使用绝对百分比，其中每个调用链的时间百分比都会被报告：因此foo->bar
是总滴答声的26%（依次调用baz
），而foo->baz
（直接）是总滴答声的12%
我仍然不知道为什么从\uu sin\u sse2
的角度看，我看不到被调用方和调用方图之间的任何差异
更新
我在命令行中做了一个更改，那就是调用图的收集方式。默认情况下，LinuxPerf使用帧指针方法重建调用堆栈。当编译器使用-fomit帧指针
作为指针时，这可能是一个问题。所以我用
perf record --call-graph dwarf ./perf-test

我不是perf
方面的专家，但我知道默认情况下，它每秒查看堆栈1000次以收集数据。因此，您尝试进行的细粒度分析可能会失败。例如，当baz
调用sin_sse2
时，可能没有任何样本发生。考虑使用<代码> GPRO，它编译存根来捕获每个调用和返回（虽然它还有其他问题）。是的，我知道，但是它很快，它让我记录了各种疯狂事件，例如每个符号的缓存错误和分支错误预测，而AFAIK <代码> GPROF不能这样做。我使用了N
相当大，就是为了避免你提到的东西；无论如何，我试着进一步增加它，但没有运气，我不知道，但我认为，即使在100米的迭代循环中，也不太可能没有收集到样本。相对与绝对百分比的事情绝对有意义。主要的一点是，我无法得到与您类似的输出（这似乎是合法的，因为所有三个函数都直接调用了\uu sinu sse2
，而我的功能是foo
，main
和bar
）。你能发布你使用的确切标志吗？@cYrus看一下我的更新。。。午餐后将发布perf report
的确切命令行。@MatthewG。嗯，看起来这一切都是关于使用dwarf
。。。我将检查这是否解决了我列出的所有问题。@cYrus我复制并粘贴了用于编译的命令行（不过，我必须将链接器标志移到末尾）。内核是3.13.0-37-generic（这可能很重要）。报告的命令行系列是perf report-g{graph，fractal}，0.05，calle{e，r}--stdio>{graph，fractal}Cal
# Overhead    Command      Shared Object                  Symbol
# ........  .........  .................  ......................
#
    94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
            |
            --- __sin_sse2
               |
               |--44.20%-- foo
               |          |
               |           --100.00%-- main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
               |--27.95%-- baz
               |          |
               |          |--51.78%-- bar
               |          |          foo
               |          |          main
               |          |          __libc_start_main
               |          |          _start
               |          |          0x0
               |          |
               |           --48.22%-- foo
               |                     main
               |                     __libc_start_main
               |                     _start
               |                     0x0
               |
                --27.84%-- bar
                          |
                           --100.00%-- foo
                                     main
                                     __libc_start_main
                                     _start
                                     0x0

 -g [type,min[,limit],order[,key]], --call-graph
       Display call chains using type, min percent threshold, optional print limit and order. type can be either:

       ·   flat: single column, linear exposure of call chains.

       ·   graph: use a graph tree, displaying absolute overhead rates.

       ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

               order can be either:
               - callee: callee based call graph.
               - caller: inverted caller based call graph.

               key can be:
               - function: compare on functions
               - address: compare on individual code addresses

               Default: fractal,0.5,callee,function.

94.72%  perf-test  libm-2.19.so       [.] __sin_sse2
        |
        --- __sin_sse2
           |
           |--41.87%-- foo
           |          |
           |           --41.48%-- main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
           |--26.48%-- baz
           |          |
           |          |--13.50%-- bar
           |          |          foo
           |          |          main
           |          |          __libc_start_main
           |          |          _start
           |          |          0x0
           |          |
           |           --12.57%-- foo
           |                     main
           |                     __libc_start_main
           |                     _start
           |                     0x0
           |
            --26.38%-- bar
                      |
                       --26.17%-- foo
                                 main
                                 __libc_start_main
                                 _start
                                 0x0

perf record --call-graph dwarf ./perf-test