Performance 英特尔的迷失周期?rdtsc和CPU_CLK_UNHALTED.REF_TSC之间不一致
在最近的CPU上(至少在过去十年左右),除了各种可配置的性能计数器外,Intel还提供了三个固定功能硬件性能计数器。三个固定计数器为:Performance 英特尔的迷失周期?rdtsc和CPU_CLK_UNHALTED.REF_TSC之间不一致,performance,x86,x86-64,cpu-architecture,rdtsc,Performance,X86,X86 64,Cpu Architecture,Rdtsc,在最近的CPU上(至少在过去十年左右),除了各种可配置的性能计数器外,Intel还提供了三个固定功能硬件性能计数器。三个固定计数器为: INST_RETIRED.ANY CPU_CLK_UNHALTED.THREAD CPU_CLK_UNHALTED.REF_TSC 第一个计算失效指令数,第二个计算实际循环数,最后一个是我们感兴趣的。《英特尔软件开发人员手册》第3卷的说明如下: 当发生以下情况时,此事件统计TSC速率下的参考循环数 磁芯不处于停止状态,也不处于TM停止时钟状态。这个 core在
INST_RETIRED.ANY
CPU_CLK_UNHALTED.THREAD
CPU_CLK_UNHALTED.REF_TSC
第一个计算失效指令数,第二个计算实际循环数,最后一个是我们感兴趣的。《英特尔软件开发人员手册》第3卷的说明如下:
当发生以下情况时,此事件统计TSC速率下的参考循环数
磁芯不处于停止状态,也不处于TM停止时钟状态。这个
core在运行HLT指令或
MWAIT指令。此事件不受堆芯频率的影响
变化(例如P状态),但计数频率与时间相同
邮票柜台。此事件可能是内核运行时经过的大致时间
未处于停止状态,也未处于TM stopclock状态
因此,对于CPU限制的循环,我希望该值与从rdstc
读取的自由运行TSC值相同,因为它们应该仅在暂停周期指令或“TM stopclock状态”是什么的情况下发散
我使用以下循环(整个循环)对此进行测试:
使用PFCSTART
和PFCEND
命令读取CPU\u CLK\u UNHALTED.REF\u TSC
计数器。\uuu rdtsc()
是一个通过rdtsc
指令读取TSC的内部程序。最后,我们使用nanos()
测量实时性,简单地说:
int64_t nanos() {
auto t = std::chrono::high_resolution_clock::now();
return std::chrono::time_point_cast<std::chrono::nanoseconds>(t).time_since_epoch().count();
}
这里,REF_TSC
是如上所述的固定TSC性能计数器,rdtsc
是rdtsc
指令的结果Eff Mhz
是在整个时间间隔内有效计算的真实CPU频率,主要是出于好奇和快速确认涡轮增压器的启动量Ratio
是REF_TSC
和rdtsc
列的比率。我预计这将非常接近1,但实际上我们看到它徘徊在0.90到0.92之间,变化很大(我在其他运行中看到它低至0.8)
从图形上看,它类似于2:
rdstc
调用返回几乎精确的结果1,而PMU TSC计数器到处都是,有时几乎低至2300 MHz
但是,如果我关闭turbo,结果会更加一致:
CPU# REF_TSC rdtsc Eff Mhz Ratio
0 2592.26 2592.25 2588.30 1.000000
0 2592.26 2592.26 2591.11 1.000000
0 2592.26 2592.26 2590.40 1.000000
0 2592.25 2592.25 2590.43 1.000000
0 2592.26 2592.26 2590.75 1.000000
0 2592.26 2592.26 2590.05 1.000000
0 2592.25 2592.25 2590.04 1.000000
0 2592.24 2592.24 2590.86 1.000000
0 2592.25 2592.25 2590.35 1.000000
0 2592.25 2592.25 2591.32 1.000000
0 2592.25 2592.25 2590.63 1.000000
0 2592.25 2592.25 2590.87 1.000000
0 2592.25 2592.25 2590.77 1.000000
0 2592.25 2592.25 2590.64 1.000000
0 2592.24 2592.24 2590.30 1.000000
0 2592.23 2592.23 2589.64 1.000000
0 2592.23 2592.23 2590.83 1.000000
0 2592.23 2592.23 2590.49 1.000000
0 2592.23 2592.23 2590.78 1.000000
0 2592.23 2592.23 2590.84 1.000000
0 2592.22 2592.22 2588.80 1.000000
基本上,这个比率是100万对小数点后6位
图形化(Y轴比例强制与上一个图形相同):
现在,代码只是在运行一个热循环,不应该有hlt
或mwait
指令,当然也不应该有超过10%的变化。我不能确定什么是“TM停止时钟周期”,但我敢打赌它们是“热管理停止时钟周期”,这是一种在CPU达到最高温度时临时调节CPU的技巧。然而,我查看了集成热敏电阻的读数,我从来没有看到CPU的温度超过60C,远低于90C-100C,我认为这是终端管理的作用所在
知道这是什么吗?在不同的涡轮频率之间是否存在隐含的“暂停周期”?这种情况肯定会发生,因为盒子不是安静的,所以当其他内核开始和停止处理背景内容时,涡轮频率会上下跳跃(最大涡轮频率直接取决于活动内核的数量:在我的盒子上,1、2、3或4个活动内核的涡轮频率分别为3.5、3.3、3.2、3.1 GHz)
1事实上,有一段时间我得到了精确到小数点后两位的结果:
2591.97MHz
-一次又一次的迭代。然后发生了一些变化,我不太确定是什么,在rdstc
结果中有大约0.1%的微小变化。一种可能性是渐进式时钟调整,由Linux定时子系统进行,以使本地水晶衍生时间与ntpd
确定的时间一致。也许,这只是一个晶体漂移-上面的最后一个图表显示了测量周期rdtsc
每秒的稳定增加
2这些图与文本中显示的值不对应,因为我不会在每次更改文本输出格式时更新这些图。但是,每次运行的定性行为基本相同。TL;博士
您观察到的RDTSC
和REFTSC
之间的差异是由于涡轮增压p状态转换造成的。在这些转换过程中,大多数核心,包括固定功能性能计数器REF_TSC
,停止约20000-21000个周期(8.5us),但rdtsc
以不变频率继续rdtsc
可能处于一个孤立的电源和时钟域中,因为它非常重要,而且因为它有记录在案的wallclock-like行为
RDTSC-REFTSC
差异
这种差异表现为RDTSC
过度计数REFTSC
。程序运行的时间越长,差异RDTSC-REFTSC
越大。在很长的一段时间内,它可以上升到1%-2%甚至更高
当然,您自己已经观察到,禁用TurboBoost时,过度计数会消失,使用英特尔&pstate
时,可以按如下方式进行:
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
但这并不能肯定地告诉我们涡轮增压器是造成这种差异的罪魁祸首;这可能是涡轮增压器启用的更高P态消耗了可用的净空,导致热节流和停机
可能的节流?
涡轮增压是一种动态频率和电压标度解决方案,以机会贯通地利用工作包络(热或电气)的净空。如果可能,TurboBoost将放大处理器的核心频率和电压
CPU# REF_TSC rdtsc Eff Mhz Ratio
0 2392.05 2591.76 2981.30 0.922946
0 2381.74 2591.79 3032.86 0.918955
0 2399.12 2591.79 3032.50 0.925660
0 2385.04 2591.79 3010.58 0.920230
0 2378.39 2591.79 3010.21 0.917663
0 2355.84 2591.77 2928.96 0.908970
0 2364.99 2591.79 2942.32 0.912492
0 2339.64 2591.77 2935.36 0.902720
0 2366.43 2591.79 3022.08 0.913049
0 2401.93 2591.79 3023.52 0.926747
0 2452.87 2591.78 3070.91 0.946400
0 2350.06 2591.79 2961.93 0.906733
0 2340.44 2591.79 2897.58 0.903020
0 2403.22 2591.79 2944.77 0.927246
0 2394.10 2591.79 3059.58 0.923723
0 2359.69 2591.78 2957.79 0.910449
0 2353.33 2591.79 2916.39 0.907992
0 2339.58 2591.79 2951.62 0.902690
0 2395.82 2591.79 3017.59 0.924389
0 2353.47 2591.79 2937.82 0.908047
CPU# REF_TSC rdtsc Eff Mhz Ratio
0 2592.26 2592.25 2588.30 1.000000
0 2592.26 2592.26 2591.11 1.000000
0 2592.26 2592.26 2590.40 1.000000
0 2592.25 2592.25 2590.43 1.000000
0 2592.26 2592.26 2590.75 1.000000
0 2592.26 2592.26 2590.05 1.000000
0 2592.25 2592.25 2590.04 1.000000
0 2592.24 2592.24 2590.86 1.000000
0 2592.25 2592.25 2590.35 1.000000
0 2592.25 2592.25 2591.32 1.000000
0 2592.25 2592.25 2590.63 1.000000
0 2592.25 2592.25 2590.87 1.000000
0 2592.25 2592.25 2590.77 1.000000
0 2592.25 2592.25 2590.64 1.000000
0 2592.24 2592.24 2590.30 1.000000
0 2592.23 2592.23 2589.64 1.000000
0 2592.23 2592.23 2590.83 1.000000
0 2592.23 2592.23 2590.49 1.000000
0 2592.23 2592.23 2590.78 1.000000
0 2592.23 2592.23 2590.84 1.000000
0 2592.22 2592.22 2588.80 1.000000
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu4/online
echo 0 > /sys/devices/system/cpu/cpu5/online
echo 0 > /sys/devices/system/cpu/cpu6/online
echo 0 > /sys/devices/system/cpu/cpu7/online
echo 98 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
1000, 1500, 2500, 4000, 6300,
10000, 15000, 25000, 40000, 63000,
100000, 150000, 250000, 400000, 630000,
1000000, 1500000, 2500000, 4000000, 6300000,
10000000, 15000000, 25000000, 40000000, 63000000
CPU 0, measured CLK_REF_TSC MHz : 2392.56
CPU 0, measured rdtsc MHz : 2392.46
CPU 0, measured add MHz : 3286.30
CPU 0, measured XREF_CLK time (s) : 0.00018200
CPU 0, measured delta time (s) : 0.00018258
CPU 0, measured tsc_delta time (s) : 0.00018200
CPU 0, ratio ref_tsc :ref_xclk : 24.00131868
CPU 0, ratio ref_core:ref_xclk : 33.00071429
CPU 0, ratio rdtsc :ref_xclk : 24.00032967
CPU 0, core CLK cycles in OS : 0
CPU 0, User-OS transitions : 0
CPU 0, rdtsc-reftsc overcount : -18
CPU 0, MSR_IA32_PACKAGE_THERM_STATUS : 000000008819080a
CPU 0, MSR_IA32_PACKAGE_THERM_INTERRUPT: 0000000000000003
CPU 0, MSR_CORE_PERF_LIMIT_REASONS : 0000000018001000
PROCHOT
Thermal
Graphics Driver
Autonomous Utilization-Based Frequency Control
Voltage Regulator Thermal Alert
Electrical Design Point (e.g. Current)
Core Power Limiting
Package-Level PL1 Power Limiting
* Package-Level PL2 Power Limiting
* Max Turbo Limit (Multi-Core Turbo)
Turbo Transition Attenuation
CPU 0, measured CLK_REF_TSC MHz : 2392.63
CPU 0, measured rdtsc MHz : 2392.62
CPU 0, measured add MHz : 3288.03
CPU 0, measured XREF_CLK time (s) : 0.00018192
CPU 0, measured delta time (s) : 0.00018248
CPU 0, measured tsc_delta time (s) : 0.00018192
CPU 0, ratio ref_tsc :ref_xclk : 24.00000000
CPU 0, ratio ref_core:ref_xclk : 32.99983509
CPU 0, ratio rdtsc :ref_xclk : 23.99989006
CPU 0, core CLK cycles in OS : 0
CPU 0, User-OS transitions : 0
CPU 0, rdtsc-reftsc overcount : -2
CPU 0, MSR_IA32_PACKAGE_THERM_STATUS : 000000008819080a
CPU 0, MSR_IA32_PACKAGE_THERM_INTERRUPT: 0000000000000003
CPU 0, MSR_CORE_PERF_LIMIT_REASONS : 0000000018001000
PROCHOT
Thermal
Graphics Driver
Autonomous Utilization-Based Frequency Control
Voltage Regulator Thermal Alert
Electrical Design Point (e.g. Current)
Core Power Limiting
Package-Level PL1 Power Limiting
* Package-Level PL2 Power Limiting
* Max Turbo Limit (Multi-Core Turbo)
Turbo Transition Attenuation
CPU 0, measured CLK_REF_TSC MHz : 2284.69
CPU 0, measured rdtsc MHz : 2392.63
CPU 0, measured add MHz : 3151.99
CPU 0, measured XREF_CLK time (s) : 0.00018121
CPU 0, measured delta time (s) : 0.00019036
CPU 0, measured tsc_delta time (s) : 0.00018977
CPU 0, ratio ref_tsc :ref_xclk : 24.00000000
CPU 0, ratio ref_core:ref_xclk : 33.38540919
CPU 0, ratio rdtsc :ref_xclk : 25.13393301
CPU 0, core CLK cycles in OS : 0
CPU 0, User-OS transitions : 0
CPU 0, rdtsc-reftsc overcount : 20548
CPU 0, MSR_IA32_PACKAGE_THERM_STATUS : 000000008819080a
CPU 0, MSR_IA32_PACKAGE_THERM_INTERRUPT: 0000000000000003
CPU 0, MSR_CORE_PERF_LIMIT_REASONS : 0000000018000000
PROCHOT
Thermal
Graphics Driver
Autonomous Utilization-Based Frequency Control
Voltage Regulator Thermal Alert
Electrical Design Point (e.g. Current)
Core Power Limiting
Package-Level PL1 Power Limiting
* Package-Level PL2 Power Limiting
* Max Turbo Limit (Multi-Core Turbo)
Turbo Transition Attenuation
CPU 0, measured CLK_REF_TSC MHz : 2392.46
CPU 0, measured rdtsc MHz : 2392.45
CPU 0, measured add MHz : 3287.80
CPU 0, measured XREF_CLK time (s) : 0.00018192
CPU 0, measured delta time (s) : 0.00018249
CPU 0, measured tsc_delta time (s) : 0.00018192
CPU 0, ratio ref_tsc :ref_xclk : 24.00000000
CPU 0, ratio ref_core:ref_xclk : 32.99978012
CPU 0, ratio rdtsc :ref_xclk : 23.99989006
CPU 0, core CLK cycles in OS : 0
CPU 0, User-OS transitions : 0
CPU 0, rdtsc-reftsc overcount : -2
CPU 0, MSR_IA32_PACKAGE_THERM_STATUS : 000000008819080a
CPU 0, MSR_IA32_PACKAGE_THERM_INTERRUPT: 0000000000000003
CPU 0, MSR_CORE_PERF_LIMIT_REASONS : 0000000018001000
PROCHOT
Thermal
Graphics Driver
Autonomous Utilization-Based Frequency Control
Voltage Regulator Thermal Alert
Electrical Design Point (e.g. Current)
Core Power Limiting
Package-Level PL1 Power Limiting
* Package-Level PL2 Power Limiting
* Max Turbo Limit (Multi-Core Turbo)
Turbo Transition Attenuation
24.00,33.00,24.00,-14,0,0
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,-4,3639,1
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-14,0,0
24.00,33.00,24.00,-14,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-44,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-14,0,0
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,12,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,32,3171,1
24.00,33.00,24.00,-20,0,0
24.00,33.00,24.00,10,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,2,0,0
24.00,33.00,24.00,22,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.05,25.11,20396,0,0
24.00,33.38,25.12,20212,0,0
24.00,33.39,25.12,20308,0,0
24.00,33.42,25.12,20296,0,0
24.00,33.43,25.11,20158,0,0
24.00,33.43,25.11,20178,0,0
24.00,33.00,24.00,-4,0,0
24.00,33.00,24.00,20,3920,1
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-4,0,0
24.00,33.44,25.13,20396,0,0
24.00,33.46,25.11,20156,0,0
24.00,33.46,25.12,20268,0,0
24.00,33.41,25.12,20322,0,0
24.00,33.40,25.11,20216,0,0
24.00,33.46,25.12,20168,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,-2,0,0
24.00,33.00,24.00,22,0,0
24.00,33.75,24.45,20166,0,0
24.00,33.78,24.45,20302,0,0
24.00,33.78,24.45,20202,0,0
24.00,33.68,24.91,41082,0,0
24.00,33.31,24.90,40998,0,0
24.00,33.70,25.30,58986,3668,1
24.00,33.74,24.42,18798,0,0
24.00,33.74,24.45,20172,0,0
24.00,33.77,24.45,20156,0,0
24.00,33.78,24.45,20258,0,0
24.00,33.78,24.45,20240,0,0
24.00,33.77,24.42,18826,0,0
24.00,33.75,24.45,20372,0,0
24.00,33.76,24.42,18798,4081,1
24.00,33.74,24.41,18460,0,0
24.00,33.75,24.45,20234,0,0
24.00,33.77,24.45,20284,0,0
24.00,33.78,24.45,20150,0,0
24.00,33.78,24.45,20314,0,0
24.00,33.78,24.42,18766,0,0
24.00,33.71,25.36,61608,0,0
24.00,33.76,24.45,20336,0,0
24.00,33.78,24.45,20234,0,0
24.00,33.78,24.45,20210,0,0
24.00,33.78,24.45,20210,0,0
24.00,33.00,24.00,-10,0,0
24.00,33.00,24.00,4,0,0
24.00,33.00,24.00,18,0,0
24.00,33.00,24.00,2,4132,1
24.00,33.00,24.00,44,0,0