Linux kernel 为什么是cpu;每周期insn“;在类似的cpu中是不同的,如何;MONITOR-MWAIT“;在Linux中工作?

Linux kernel 为什么是cpu;每周期insn“;在类似的cpu中是不同的,如何;MONITOR-MWAIT“;在Linux中工作?,linux-kernel,cpu,perf,branch-prediction,Linux Kernel,Cpu,Perf,Branch Prediction,背景: 我有2台服务器,所有的操作系统内核版本都是4.18.7,它有CONFIG\u BPF\u SYSCALL=y 我创建了一个shell脚本“x.sh” i=0 while (( i < 1000000 )) do (( i ++ )) done S2: CPU--Intel(R)Xeon(R)CPU E5-2620 v4@2.10GHz,微码--0xb00002e 和性能统计结果 5391.653531 task-clock (msec) #

背景: 我有2台服务器,所有的操作系统内核版本都是4.18.7,它有CONFIG\u BPF\u SYSCALL=y

我创建了一个shell脚本“x.sh”

i=0 
while (( i < 1000000 )) 
do (( i ++ )) 
done
S2: CPU--Intel(R)Xeon(R)CPU E5-2620 v4@2.10GHz,微码--0xb00002e 和性能统计结果

   5391.653531      task-clock (msec)         #    1.000 CPUs utilized          
             4      context-switches          #    0.001 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
           107      page-faults               #    0.020 K/sec                  
12,910,036,202      cycles                    #    2.394 GHz                    
27,055,073,385      instructions              #    2.10  insn per cycle         
 6,527,267,657      branches                  # 1210.624 M/sec                  
    34,787,686      branch-misses             #    0.53% of all branches        

   5.392121575 seconds time elapsed
  10688.669439      task-clock (msec)         #    1.000 CPUs utilized          
             6      context-switches          #    0.001 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
           105      page-faults               #    0.010 K/sec                  
24,583,857,467      cycles                    #    2.300 GHz                    
27,117,299,405      instructions              #    1.10  insn per cycle         
 6,571,204,123      branches                  #  614.782 M/sec                  
    32,996,513      branch-misses             #    0.50% of all branches        

  10.688907278 seconds time elapsed
问题: 我们可以看到cpu是相似的,os内核是相同的,但是为什么perf stat的周期如此不同

编辑: 我修改shell和命令: x、 sh,将循环次数变小以减少花费的时间

i=0
while (( i < 10000 )) 
do
  (( i ++))
done
S2:

并发现rpm版本不同

性能差异显示:

# Event 'cycles'
#
# Baseline    Delta  Shared Object      Symbol
# ........  .......  .................  ..............................................
#
21.20%   +3.83%  bash               [.] 0x000000000002c0f0
10.22%           libc-2.17.so       [.] _int_free
 9.11%           libc-2.17.so       [.] _int_malloc
 7.97%           libc-2.17.so       [.] malloc
 4.09%           libc-2.17.so       [.] __gconv_transform_utf8_internal
 3.71%           libc-2.17.so       [.] __mbrtowc
 3.48%   -1.63%  bash               [.] execute_command_internal
 3.48%   +1.18%  [unknown]          [k] 0xfffffe0000032000
 3.25%   -1.87%  bash               [.] xmalloc
 3.12%           libc-2.17.so       [.] __strcpy_sse2_unaligned
 2.44%   +2.22%  [kernel.kallsyms]  [k] syscall_return_via_sysret
 2.09%   -0.24%  bash               [.] evalexp
 2.09%           libc-2.17.so       [.] __ctype_get_mb_cur_max
 1.92%           libc-2.17.so       [.] free
 1.41%   -0.95%  bash               [.] dequote_string
 1.19%   +0.23%  bash               [.] stupidly_hack_special_variables
 1.16%           libc-2.17.so       [.] __strlen_sse2_pminub
 1.16%           libc-2.17.so       [.] __memcpy_ssse3_back
 1.16%           libc-2.17.so       [.] __strcmp_sse42
 0.93%   -0.01%  bash               [.] mbschr
 0.93%   -0.47%  bash               [.] hash_search
 0.70%           libc-2.17.so       [.] __sigprocmask
 0.70%   -0.23%  bash               [.] dispose_words
 0.70%   -0.23%  bash               [.] execute_command
 0.70%   -0.23%  bash               [.] set_pipestatus_array
 0.70%           bash               [.] run_pending_traps
 0.47%           bash               [.] malloc@plt
 0.47%           bash               [.] var_lookup
 0.47%           bash               [.] fmtumax
 0.47%           bash               [.] do_redirections
 0.46%           bash               [.] dispose_word
 0.46%   -0.00%  bash               [.] alloc_word_desc
 0.46%   -0.00%  [kernel.kallsyms]  [k] _copy_to_user
 0.46%           libc-2.17.so       [.] __ctype_b_loc
 0.46%           bash               [.] new_fd_bitmap
 0.46%           bash               [.] add_unwind_protect
 0.46%   -0.00%  bash               [.] discard_unwind_frame
 0.46%           bash               [.] memcpy@plt
 0.46%           bash               [.] __ctype_get_mb_cur_max@plt
 0.46%           bash               [.] signal_in_progress
 0.40%           libc-2.17.so       [.] _IO_vfscanf
 0.40%           ld-2.17.so         [.] do_lookup_x
 0.27%           bash               [.] mbrtowc@plt
 0.24%   +1.60%  [kernel.kallsyms]  [k] __x64_sys_rt_sigprocmask
 0.23%           bash               [.] list_append
 0.23%           bash               [.] bind_variable
 0.23%   +0.69%  [kernel.kallsyms]  [k] entry_SYSCALL_64_stage2
 0.23%   +0.69%  [kernel.kallsyms]  [k] do_syscall_64
 0.23%           libc-2.17.so       [.] _dl_mcount_wrapper_check
 0.23%   +0.69%  bash               [.] make_word_list
 0.23%   +0.69%  [kernel.kallsyms]  [k] copy_user_generic_unrolled
 0.23%           [kernel.kallsyms]  [k] unmap_page_range
 0.23%           libc-2.17.so       [.] __sigjmp_save
 0.23%   +0.23%  [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
 0.20%           [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
 0.03%           [kernel.kallsyms]  [k] page_fault
 0.00%           [kernel.kallsyms]  [k] xfs_bmapi_read
 0.00%           [kernel.kallsyms]  [k] xfs_release
 0.00%   +0.00%  [kernel.kallsyms]  [k] native_write_msr
        +45.33%  libc-2.17.so       [.] 0x0000000000027cc6
         +0.52%  [kernel.kallsyms]  [k] __mod_node_page_state
         +0.46%  bash               [.] free@plt
         +0.46%  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
         +0.46%  bash               [.] begin_unwind_frame
         +0.46%  bash               [.] make_bare_word
         +0.46%  bash               [.] find_variable_internal
         +0.37%  ld-2.17.so         [.] 0x0000000000009b13
也许glibc的差异就是答案

编辑: 最后,我检查了BIOS的配置,看到S2服务器使用了省电模式,这才是真正的答案

但是,BIOS的配置使我混淆了MONITOR-MWAIT,即使使用“最大性能模式”和“MONITOR-MWAIT”启用,S2的性能也很差。并使用命令
cpupower idle info-o
,请参阅cpu使用“C状态”,该状态已在“最大性能模式”中禁用。它必须是禁用加上“最大性能模式”,性能才会更好

“MONITOR-MWAIT”的描述说,一些操作系统将检查此选项以恢复“C-state”,我找不到Linux内核如何使用它来更改“C-state”…

我找到了答案

首先,让我们看看内核4.18.7中BIOS的MONITOR/MWAIT选项。 在该内核中,它将使用intel\u idle驱动程序,该驱动程序只检查系统是否支持mwait指令,而不关心是否启用了C状态。 一旦使用MONITOR/MWAIT指令,将使用intel_空闲驱动程序,并强制使用C状态,就像使用省电模式一样

第二,为什么每个周期的insn不同? 因为,使用了服务调谐,活动配置文件为“延迟性能”,强制延迟为1U。 如果使用C-state,将使用延迟小于force_延迟的C-state级别

# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 5
Available idle states: POLL C1 C1E C3 C6
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 13034605
Duration: 820867557
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 349471619
Duration: 344311623672
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 237
Duration: 55999
C3:
Flags/Description: MWAIT 0x10
Latency: 40
Usage: 350
Duration: 168988
C6:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 3696
Duration: 17809893
您将只看到延迟小于1U的轮询级别,轮询级别将强制CPU使用NOP指令运行。 在这种情况下,如果使用超线程技术,将使执行指令的速度下降一半。 因为两个逻辑核将共享一个ALU,并且其中一个正在运行NOP指令,所以导致另一个逻辑核必须等待它

如果禁用MONITOR/MWAIT选项,则intel_idle驱动程序将被禁用,因此将不使用调整后的服务的强制延迟,并且逻辑核心的一个将停止,使另一个使用ALU独占性


最后,感谢大家,特别是@Peter Cordes和@osgx,让我检查BIOS,命令
echo 2^1234567%2 | bc
非常漂亮

您确定这是上述脚本的输出吗?它只包含1百万个非常简单的迭代,但输出列出了数十亿个分支和5秒的总时间。是的,我确定!我认为,该脚本是一个shell脚本,因此它将被编译成更多的指令,因为它将调用内核,内核有更多的分支。请提供一个包含编译选项和编译器版本以及特定的
perf
命令行的脚本。shell脚本不是可重复基准测试的良好基础。。。至少你必须提供shell和它的版本。。。请在你的帖子中包含这些信息,不要(仅)在评论中发表。进行更多的系统调用当然可以解释这一点。启用Spectre/Meldown缓解功能后,系统调用会有很大的开销,由于TLB收回,在返回后的一段时间内,一切都会变慢。也许bash正在分配/释放内存,在速度较慢的机器上,它决定每次都将内存返回操作系统,而不是将其保留在空闲列表中。如果您为libc安装调试符号,您能分析在libc中花费的大量时间吗?
/usr/lib64/ld-2.17.so with build id 93d2e4a501823d041413eeb652b89044d1f680ee not found, continuing without symbols
/usr/lib64/libc-2.17.so with build id b04a54c443d36058702ab4060c63f4ab3273eae9 not found, continuing without symbols
# Event 'cycles'
#
# Baseline    Delta  Shared Object      Symbol
# ........  .......  .................  ..............................................
#
21.20%   +3.83%  bash               [.] 0x000000000002c0f0
10.22%           libc-2.17.so       [.] _int_free
 9.11%           libc-2.17.so       [.] _int_malloc
 7.97%           libc-2.17.so       [.] malloc
 4.09%           libc-2.17.so       [.] __gconv_transform_utf8_internal
 3.71%           libc-2.17.so       [.] __mbrtowc
 3.48%   -1.63%  bash               [.] execute_command_internal
 3.48%   +1.18%  [unknown]          [k] 0xfffffe0000032000
 3.25%   -1.87%  bash               [.] xmalloc
 3.12%           libc-2.17.so       [.] __strcpy_sse2_unaligned
 2.44%   +2.22%  [kernel.kallsyms]  [k] syscall_return_via_sysret
 2.09%   -0.24%  bash               [.] evalexp
 2.09%           libc-2.17.so       [.] __ctype_get_mb_cur_max
 1.92%           libc-2.17.so       [.] free
 1.41%   -0.95%  bash               [.] dequote_string
 1.19%   +0.23%  bash               [.] stupidly_hack_special_variables
 1.16%           libc-2.17.so       [.] __strlen_sse2_pminub
 1.16%           libc-2.17.so       [.] __memcpy_ssse3_back
 1.16%           libc-2.17.so       [.] __strcmp_sse42
 0.93%   -0.01%  bash               [.] mbschr
 0.93%   -0.47%  bash               [.] hash_search
 0.70%           libc-2.17.so       [.] __sigprocmask
 0.70%   -0.23%  bash               [.] dispose_words
 0.70%   -0.23%  bash               [.] execute_command
 0.70%   -0.23%  bash               [.] set_pipestatus_array
 0.70%           bash               [.] run_pending_traps
 0.47%           bash               [.] malloc@plt
 0.47%           bash               [.] var_lookup
 0.47%           bash               [.] fmtumax
 0.47%           bash               [.] do_redirections
 0.46%           bash               [.] dispose_word
 0.46%   -0.00%  bash               [.] alloc_word_desc
 0.46%   -0.00%  [kernel.kallsyms]  [k] _copy_to_user
 0.46%           libc-2.17.so       [.] __ctype_b_loc
 0.46%           bash               [.] new_fd_bitmap
 0.46%           bash               [.] add_unwind_protect
 0.46%   -0.00%  bash               [.] discard_unwind_frame
 0.46%           bash               [.] memcpy@plt
 0.46%           bash               [.] __ctype_get_mb_cur_max@plt
 0.46%           bash               [.] signal_in_progress
 0.40%           libc-2.17.so       [.] _IO_vfscanf
 0.40%           ld-2.17.so         [.] do_lookup_x
 0.27%           bash               [.] mbrtowc@plt
 0.24%   +1.60%  [kernel.kallsyms]  [k] __x64_sys_rt_sigprocmask
 0.23%           bash               [.] list_append
 0.23%           bash               [.] bind_variable
 0.23%   +0.69%  [kernel.kallsyms]  [k] entry_SYSCALL_64_stage2
 0.23%   +0.69%  [kernel.kallsyms]  [k] do_syscall_64
 0.23%           libc-2.17.so       [.] _dl_mcount_wrapper_check
 0.23%   +0.69%  bash               [.] make_word_list
 0.23%   +0.69%  [kernel.kallsyms]  [k] copy_user_generic_unrolled
 0.23%           [kernel.kallsyms]  [k] unmap_page_range
 0.23%           libc-2.17.so       [.] __sigjmp_save
 0.23%   +0.23%  [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
 0.20%           [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
 0.03%           [kernel.kallsyms]  [k] page_fault
 0.00%           [kernel.kallsyms]  [k] xfs_bmapi_read
 0.00%           [kernel.kallsyms]  [k] xfs_release
 0.00%   +0.00%  [kernel.kallsyms]  [k] native_write_msr
        +45.33%  libc-2.17.so       [.] 0x0000000000027cc6
         +0.52%  [kernel.kallsyms]  [k] __mod_node_page_state
         +0.46%  bash               [.] free@plt
         +0.46%  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
         +0.46%  bash               [.] begin_unwind_frame
         +0.46%  bash               [.] make_bare_word
         +0.46%  bash               [.] find_variable_internal
         +0.37%  ld-2.17.so         [.] 0x0000000000009b13
# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 5
Available idle states: POLL C1 C1E C3 C6
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 13034605
Duration: 820867557
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 349471619
Duration: 344311623672
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 237
Duration: 55999
C3:
Flags/Description: MWAIT 0x10
Latency: 40
Usage: 350
Duration: 168988
C6:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 3696
Duration: 17809893