Linux 如何计算失败次数?

Linux 如何计算失败次数?,linux,performance,matlab,cpu,benchmarking,Linux,Performance,Matlab,Cpu,Benchmarking,我在一台服务器上并行运行我的程序(英特尔(R)Core(TM)i7-4770CPU@3.40GHz)。此服务器有4个核心,每个核心都有额外的超线程,即总共8个核心/线程 我发现当我的程序的并行度小于4时,它可以获得几乎线性的加速比(见右图)。但是,当大于4时,加速会加剧。所以,我怀疑这是因为浮点单位。此服务器只有4个浮点单位。我想通过计算(每秒浮点运算数)来解释我的实验结果。那么,我如何计算失败的次数呢?他们有没有其他方法来解释这个结果?谢谢 以下是我的性能列表: afancy@ubuntu:

我在一台服务器上并行运行我的程序(英特尔(R)Core(TM)i7-4770CPU@3.40GHz)。此服务器有4个核心,每个核心都有额外的超线程,即总共8个核心/线程

我发现当我的程序的并行度小于4时,它可以获得几乎线性的加速比(见右图)。但是,当大于4时,加速会加剧。所以,我怀疑这是因为浮点单位。此服务器只有4个浮点单位。我想通过计算(每秒浮点运算数)来解释我的实验结果。那么,我如何计算失败的次数呢?他们有没有其他方法来解释这个结果?谢谢

以下是我的性能列表:

afancy@ubuntu:$ perf list

List of pre-defined events (to be used in -e):
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  ref-cycles                                         [Hardware event]

  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]
  context-switches OR cs                             [Software event]
  cpu-migrations OR migrations                       [Software event]
  minor-faults                                       [Software event]
  major-faults                                       [Software event]
  alignment-faults                                   [Software event]
  emulation-faults                                   [Software event]

  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-prefetches                               [Hardware cache event]
  L1-icache-prefetch-misses                          [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  LLC-prefetch-misses                                [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-prefetches                                    [Hardware cache event]
  dTLB-prefetch-misses                               [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-stores                                        [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-prefetches                                    [Hardware cache event]
  node-prefetch-misses                               [Hardware cache event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[:access]                                [Hardware breakpoint]
afancy@ubuntu:$perf列表
预定义事件列表(在-e中使用):
cpu周期或周期[硬件事件]
指令[硬件事件]
缓存引用[硬件事件]
缓存未命中[硬件事件]
分支指令或分支[硬件事件]
分支未命中[硬件事件]
总线周期[硬件事件]
暂停周期前端或空闲周期前端[硬件事件]
暂停周期后端或空闲周期后端[硬件事件]
参考周期[硬件事件]
cpu时钟[软件事件]
任务时钟[软件事件]
页面错误或错误[软件事件]
上下文开关或cs[软件事件]
cpu迁移或迁移[软件事件]
轻微故障[软件事件]
重大故障[软件事件]
校准故障[软件事件]
仿真故障[软件事件]
L1数据缓存加载[硬件缓存事件]
L1数据缓存加载未命中[硬件缓存事件]
L1 dcache存储[硬件缓存事件]
L1 dcache存储未命中[硬件缓存事件]
L1 dcache预取[硬件缓存事件]
L1 dcache预取未命中[硬件缓存事件]
L1 icache加载[硬件缓存事件]
L1 icache加载未命中[硬件缓存事件]
L1 icache预取[硬件缓存事件]
L1 icache预取未命中[硬件缓存事件]
LLC加载[硬件缓存事件]
LLC加载未命中[硬件缓存事件]
LLC存储[硬件缓存事件]
LLC存储未命中[硬件缓存事件]
LLC预取[硬件缓存事件]
LLC预取未命中[硬件缓存事件]
dTLB加载[硬件缓存事件]
dTLB加载未命中[硬件缓存事件]
dTLB存储[硬件缓存事件]
dTLB存储未命中[硬件缓存事件]
dTLB预取[硬件缓存事件]
dTLB预取未命中[硬件缓存事件]
iTLB加载[硬件缓存事件]
iTLB加载未命中[硬件缓存事件]
分支加载[硬件缓存事件]
分支加载未命中[硬件缓存事件]
节点加载[硬件缓存事件]
节点加载未命中[硬件缓存事件]
节点存储[硬件缓存事件]
节点存储未命中[硬件缓存事件]
节点预取[硬件缓存事件]
节点预取未命中[硬件缓存事件]
rNNN[原始硬件事件描述符]
cpu/t1=v1[,t2=v2,t3…]/modifier[原始硬件事件描述符]
(有关如何编码,请参见“人员性能列表”)
mem:[:访问][硬件断点]

以下是
perf stat matlab-nodesktop-no JVM的结果您可能无法使用本机matlab代码计算触发器,因为它依赖于利用特定硬件(例如SIMD)的许多优化库。你可以试试这个工具箱,但我不知道它有多可靠。了解一下什么是超线程-每个内核可能有2个硬件线程,但它们之间仍然只有1个FPU-对于特别密集的工作负载,超过4个线程实际上可能会因上下文切换而使事情变慢。你是对的,每个核心两个超线程。如果您使用上下文切换来解释加速,但是,当我使用两个并行时,为什么上下文切换的值更高?即,39038上下文开关和40231上下文开关-switches@afancy:每个核心运行两倍的代码,这将导致每个核心的上下文切换量增加两倍。
======================Num. of cores/threads = 2======================


                 458223.935241 task-clock                #    0.999 CPUs utilized          
                        39,038 context-switches          #    0.085 K/sec                  
                            78 cpu-migrations            #    0.000 K/sec                  
                       459,290 page-faults               #    0.001 M/sec                  
             1,598,967,197,448 cycles                    #    3.489 GHz                    
               <not supported> stalled-cycles-frontend 
               <not supported> stalled-cycles-backend  
             3,052,651,880,341 instructions              #    1.91  insns per cycle        
               675,069,830,714 branches                  # 1473.231 M/sec                  
                 3,699,587,126 branch-misses             #    0.55% of all branches        

                 458.519712953 seconds time elapsed
------------------------------------------------------
     472493.757765 task-clock                #    0.999 CPUs utilized          
            40,231 context-switches          #    0.085 K/sec                  
                83 cpu-migrations            #    0.000 K/sec                  
           454,849 page-faults               #    0.963 K/sec                  
 1,648,754,575,728 cycles                    #    3.489 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 3,050,973,794,286 instructions              #    1.85  insns per cycle        
   674,701,101,539 branches                  # 1427.958 M/sec                  
     3,854,961,561 branch-misses             #    0.57% of all branches        

     472.810679033 seconds time elapsed

==============    Num. of cores/threads = 4 ==========================


         233673.870204 task-clock                #    0.998 CPUs utilized          
                20,265 context-switches          #    0.087 K/sec                  
                   110 cpu-migrations            #    0.000 K/sec                  
               248,922 page-faults               #    0.001 M/sec                  
       815,466,229,226 cycles                    #    3.490 GHz                    
       <not supported> stalled-cycles-frontend 
       <not supported> stalled-cycles-backend  
     1,528,487,784,122 instructions              #    1.87  insns per cycle        
       338,001,335,905 branches                  # 1446.466 M/sec                  
         1,878,625,642 branch-misses             #    0.56% of all branches        

         234.029335936 seconds time elapsed
---------------------------------------------
     231203.147937 task-clock                #    0.998 CPUs utilized          
            20,028 context-switches          #    0.087 K/sec                  
                91 cpu-migrations            #    0.000 K/sec                  
           249,906 page-faults               #    0.001 M/sec                  
   806,862,892,981 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,525,844,491,295 instructions              #    1.89  insns per cycle        
   337,423,026,113 branches                  # 1459.422 M/sec                  
     1,839,223,079 branch-misses             #    0.55% of all branches        

     231.578239447 seconds time elapsed
 -----------------------------------------
      233813.938379 task-clock                #    0.998 CPUs utilized          
            20,210 context-switches          #    0.086 K/sec                  
                78 cpu-migrations            #    0.000 K/sec                  
           246,951 page-faults               #    0.001 M/sec                  
   815,974,334,825 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,525,890,625,730 instructions              #    1.87  insns per cycle        
   337,426,244,903 branches                  # 1443.140 M/sec                  
     1,981,754,037 branch-misses             #    0.59% of all branches        

     234.193620912 seconds time elapsed
-------------------------------------------------
     233269.315745 task-clock                #    0.998 CPUs utilized          
            20,202 context-switches          #    0.087 K/sec                  
               112 cpu-migrations            #    0.000 K/sec                  
           230,240 page-faults               #    0.987 K/sec                  
   814,074,094,896 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,526,825,737,326 instructions              #    1.88  insns per cycle        
   337,639,762,266 branches                  # 1447.425 M/sec                  
     1,852,788,062 branch-misses             #    0.55% of all branches        

     233.642106982 seconds time elapsed     

====================== Num. of cores/threads = 6 ================


         232682.918326 task-clock                #    0.998 CPUs utilized          
            22,109 context-switches          #    0.095 K/sec                  
                96 cpu-migrations            #    0.000 K/sec                  
           172,440 page-faults               #    0.741 K/sec                  
   811,991,238,956 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,019,407,910,404 instructions              #    1.26  insns per cycle        
   225,426,394,521 branches                  #  968.814 M/sec                  
     1,344,046,527 branch-misses             #    0.60% of all branches        

     233.124504147 seconds time elapsed
 ------------------------------------------    
       210835.066220 task-clock                #    0.998 CPUs utilized          
            18,696 context-switches          #    0.089 K/sec                  
               107 cpu-migrations            #    0.001 K/sec                  
           173,955 page-faults               #    0.825 K/sec                  
   735,764,609,235 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,019,083,429,216 instructions              #    1.39  insns per cycle        
   225,355,627,333 branches                  # 1068.872 M/sec                  
     1,316,268,293 branch-misses             #    0.58% of all branches        

     211.323109113 seconds time elapsed
 ---------------------------------------------    
       179852.029353 task-clock                #    0.998 CPUs utilized          
            15,465 context-switches          #    0.086 K/sec                  
               107 cpu-migrations            #    0.001 K/sec                  
           172,942 page-faults               #    0.962 K/sec                  
   627,644,775,747 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,017,482,864,797 instructions              #    1.62  insns per cycle        
   225,004,972,767 branches                  # 1251.056 M/sec                  
     1,255,067,791 branch-misses             #    0.56% of all branches        

     180.246118105 seconds time elapsed
---------------------------------------------     
     219614.665400 task-clock                #    0.998 CPUs utilized          
            21,290 context-switches          #    0.097 K/sec                  
                90 cpu-migrations            #    0.000 K/sec                  
           170,882 page-faults               #    0.778 K/sec                  
   766,392,860,245 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,017,686,212,128 instructions              #    1.33  insns per cycle        
   225,049,868,367 branches                  # 1024.749 M/sec                  
     1,322,942,620 branch-misses             #    0.59% of all branches        

     220.092311263 seconds time elapsed
----------------------------------------------          
       176764.084715 task-clock                #    0.998 CPUs utilized          
            15,282 context-switches          #    0.086 K/sec                  
                99 cpu-migrations            #    0.001 K/sec                  
           168,629 page-faults               #    0.954 K/sec                  
   616,874,157,735 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
 1,018,436,813,450 instructions              #    1.65  insns per cycle        
   225,214,699,712 branches                  # 1274.098 M/sec                  
     1,271,583,320 branch-misses             #    0.56% of all branches        

     177.198129682 seconds time elapsed   


========================   Num. of cores/threads = 8 ==================


         207252.104133 task-clock                #    0.998 CPUs utilized          
            18,598 context-switches          #    0.090 K/sec                  
                99 cpu-migrations            #    0.000 K/sec                  
           144,037 page-faults               #    0.695 K/sec                  
   723,242,099,542 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   764,758,792,593 instructions              #    1.06  insns per cycle        
   169,108,788,865 branches                  #  815.957 M/sec                  
     1,068,941,156 branch-misses             #    0.63% of all branches        

     207.729752155 seconds time elapsed
 ----------------------------------------------  
      206174.337637 task-clock                #    0.998 CPUs utilized          
            22,188 context-switches          #    0.108 K/sec                  
               118 cpu-migrations            #    0.001 K/sec                  
           132,956 page-faults               #    0.645 K/sec                  
   719,474,677,828 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   765,214,496,607 instructions              #    1.06  insns per cycle        
   169,211,117,316 branches                  #  820.719 M/sec                  
     1,039,836,842 branch-misses             #    0.61% of all branches        

     206.652707435 seconds time elapsed
 ----------------------------------------------  
      205240.082258 task-clock                #    0.989 CPUs utilized          
            44,991 context-switches          #    0.219 K/sec                  
               163 cpu-migrations            #    0.001 K/sec                  
           136,109 page-faults               #    0.663 K/sec                  
   716,133,704,444 cycles                    #    3.489 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   763,898,836,941 instructions              #    1.07  insns per cycle        
   168,924,070,103 branches                  #  823.056 M/sec                  
     1,066,021,420 branch-misses             #    0.63% of all branches        

     207.511466061 seconds time elapsed
 ----------------------------------------------  
      205016.856849 task-clock                #    0.989 CPUs utilized          
            44,386 context-switches          #    0.216 K/sec                  
               180 cpu-migrations            #    0.001 K/sec                  
           133,995 page-faults               #    0.654 K/sec                  
   715,351,228,880 cycles                    #    3.489 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   763,637,525,789 instructions              #    1.07  insns per cycle        
   168,860,189,098 branches                  #  823.641 M/sec                  
     1,056,980,771 branch-misses             #    0.63% of all branches        

     207.231704712 seconds time elapsed
 ----------------------------------------------  
      205388.150659 task-clock                #    0.998 CPUs utilized          
            21,328 context-switches          #    0.104 K/sec                  
               103 cpu-migrations            #    0.001 K/sec                  
           135,843 page-faults               #    0.661 K/sec                  
   716,737,227,792 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   764,359,316,365 instructions              #    1.07  insns per cycle        
   169,023,595,573 branches                  #  822.947 M/sec                  
     1,045,914,789 branch-misses             #    0.62% of all branches        

     205.857635295 seconds time elapsed
 ----------------------------------------------  
      207178.729781 task-clock                #    0.998 CPUs utilized          
            17,956 context-switches          #    0.087 K/sec                  
               105 cpu-migrations            #    0.001 K/sec                  
           137,996 page-faults               #    0.666 K/sec                  
   722,998,617,131 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   763,085,695,510 instructions              #    1.06  insns per cycle        
   168,733,709,256 branches                  #  814.435 M/sec                  
     1,052,517,264 branch-misses             #    0.62% of all branches        

     207.608998891 seconds time elapsed
 ----------------------------------------------  
      206701.393252 task-clock                #    0.998 CPUs utilized          
            24,596 context-switches          #    0.119 K/sec                  
               137 cpu-migrations            #    0.001 K/sec                  
           136,553 page-faults               #    0.661 K/sec                  
   721,294,495,478 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   764,246,861,748 instructions              #    1.06  insns per cycle        
   168,997,611,020 branches                  #  817.593 M/sec                  
     1,050,078,827 branch-misses             #    0.62% of all branches        

     207.206805179 seconds time elapsed
 ----------------------------------------------  
          206455.394644 task-clock                #    0.997 CPUs utilized          
            26,089 context-switches          #    0.126 K/sec                  
                87 cpu-migrations            #    0.000 K/sec                  
           132,658 page-faults               #    0.643 K/sec                  
   720,429,194,133 cycles                    #    3.490 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   764,339,875,802 instructions              #    1.06  insns per cycle        
   169,014,685,081 branches                  #  818.650 M/sec                  
     1,047,046,966 branch-misses             #    0.62% of all branches        

     206.982094466 seconds time elapsed