C++ 当有足够的空闲RAM时使用交换。性能受到影响

C++ 当有足够的空闲RAM时使用交换。性能受到影响,c++,linux,performance,memory,C++,Linux,Performance,Memory,我编写了一个简单的程序来研究在Linux上使用大量RAM时的性能(64位Red Hat Enterprise Linux Server 6.4版)。(请忽略内存泄漏。) 顶部输出如下所示。虽然有足够的空闲RAM,但我们可以看到交换增加了。结果是运行时间从3秒飙升到64秒 top - 11:46:55 up 21 days, 1:14, 18 users, load average: 1.24, 1.25, 0.95 Tasks: 819 total, 3 running, 816 sle

我编写了一个简单的程序来研究在Linux上使用大量RAM时的性能(64位Red Hat Enterprise Linux Server 6.4版)。(请忽略内存泄漏。)

顶部输出如下所示。虽然有足够的空闲RAM,但我们可以看到交换增加了。结果是运行时间从3秒飙升到64秒

top - 11:46:55 up 21 days,  1:14, 18 users,  load average: 1.24, 1.25, 0.95
Tasks: 819 total,   3 running, 816 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.6%us,  1.4%sy,  0.0%ni, 97.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 127500344k used,  4609744k free,   262288k buffers
Swap: 10485752k total,     4112k used, 10481640k free, 45988192k cached

top - 11:47:01 up 21 days,  1:14, 18 users,  load average: 1.38, 1.27, 0.96
Tasks: 819 total,   2 running, 817 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  2.1%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 131620156k used,   489932k free,   262288k buffers
Swap: 10485752k total,     4112k used, 10481640k free, 45844228k cached

top - 11:47:53 up 21 days,  1:15, 18 users,  load average: 1.25, 1.26, 0.97
Tasks: 819 total,   2 running, 817 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  2.5%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 131626300k used,   483788k free,   262276k buffers
Swap: 10485752k total,     5464k used, 10480288k free, 43056696k cached

top - 11:47:56 up 21 days,  1:15, 18 users,  load average: 1.23, 1.26, 0.97
Tasks: 819 total,   2 running, 817 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  2.5%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 131627568k used,   482520k free,   262276k buffers
Swap: 10485752k total,     5792k used, 10479960k free, 42949788k cached

top - 11:47:59 up 21 days,  1:15, 18 users,  load average: 1.21, 1.25, 0.97
Tasks: 819 total,   2 running, 817 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  2.5%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 131623080k used,   487008k free,   262276k buffers
Swap: 10485752k total,     6312k used, 10479440k free, 42840068k cached

top - 11:48:02 up 21 days,  1:15, 18 users,  load average: 1.21, 1.25, 0.97
Tasks: 819 total,   2 running, 817 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  2.5%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132110088k total, 131620016k used,   490072k free,   262276k buffers
Swap: 10485752k total,     6772k used, 10478980k free, 42729276k cached
我阅读和阅读。我的问题是

  • 为什么Linux会牺牲性能而不是完全使用缓存RAM?内存碎片?但将数据交换肯定也会造成碎片化

  • 在达到物理RAM大小之前,是否有一个解决方法来获得一致的3秒

  • 谢谢

    更新1: 从顶部添加更多输出

    更新2: 根据David的建议,查看/proc//io表明我的程序没有I/O。因此David的第一个答案应该解释这一观察结果。现在是我的第二个问题。如何提高作为非root用户的性能(不能修改交换性等)

    更新3:我切换到另一台机器,因为我需要执行一些命令。这是一台配备Intel(R)Xeon(R)CPU E5-2680 0@2.70GHz的真机(无虚拟机)。这台机器有16个物理内核

    uname -a
    2.6.32-642.4.2.el6.x86_64 #1 SMP Tue Aug 23 19:58:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
    
    通过更多的迭代运行osgx修改后的代码

    Iteration 451
    Time to malloc: 1.81198e-05
    Time to fill with data: 0.109081
    Fill rate with data: **916**.75 Mints/sec, 3667Mbytes/sec
    Time to second write access of data: 0.049731
    Access rate of data: 2010.82 Mints/sec, 8043.27Mbytes/sec
    Time to third write access of data: 0.0478709
    Access rate of data: 2088.95 Mints/sec, 8355.81Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 180800Mbytes
    Iteration 452
    Time to malloc: 1.09673e-05
    Time to fill with data: 5.16316
    Fill rate with data: **19**.368 Mints/sec, 77.4719Mbytes/sec
    Time to second write access of data: 0.0495219
    Access rate of data: 2019.31 Mints/sec, 8077.23Mbytes/sec
    Time to third write access of data: 0.0439548
    Access rate of data: 2275.06 Mints/sec, 9100.25Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 181200Mbytes
    
    当速度减慢时,我确实看到内核从2MB页面切换到4KB页面

    vmstat 1 60
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     2  0 1217396 11506356 5911040 47499184    0    2    35    47    0    0 14  2 84  0  0  
     2  0 1217396 11305860 5911040 47499184    4    0     4    36 5163 3460  7  6 87  0  0  
     2  0 1217396 11112744 5911040 47499188    0    0     0     0 4326 3451  7  6 87  0  0  
     2  0 1217396 10980556 5911040 47499188    0    0     0     0 4801 3385  7  6 87  0  0  
     2  0 1217396 10845940 5911040 47499192    0    0     0    20 4650 3596  7  6 87  0  0  
     2  0 1217396 10712508 5911040 47499200    0    0     0     0 5743 3562  7  6 87  0  0  
     2  0 1217396 10583380 5911040 47499200    0    0     0    40 4531 3622  7  6 87  0  0  
     2  0 1217396 10449096 5911040 47499200    0    0     0     0 4516 3629  7  6 87  0  0  
     2  0 1217396 10187856 5911040 47499200    0    0     0     0 4499 3456  7  6 87  0  0  
     2  0 1217396 10053256 5911040 47499204    0    0     0     8 5334 3507  7  6 87  0  0  
     2  0 1217396 9921624 5911040 47499204    0    0     0     0 6310 3593  6  6 87  0  0   
     2  0 1217396 9788532 5911040 47499208    0    0     0    44 5794 3516  7  6 87  0  0   
     2  0 1217396 9660516 5911040 47499208    0    0     0     0 4894 3535  7  6 87  0  0   
     2  0 1217396 9527552 5911040 47499212    0    0     0     0 4686 3570  7  6 87  0  0   
     2  0 1217396 9396536 5911040 47499212    0    0     0     0 4805 3538  7  6 87  0  0   
     2  0 1217396 9238664 5911040 47499212    0    0     0     0 5940 3459  7  6 87  0  0   
     2  0 1217396 9000136 5911040 47499216    0    0     0    32 5239 3333  7  6 87  0  0   
     2  0 1217396 8861132 5911040 47499220    0    0     0     0 5579 3351  7  6 87  0  0   
     2  0 1217396 8733688 5911040 47499220    0    0     0     0 4910 3199  7  6 87  0  0   
     2  0 1217396 8596600 5911040 47499224    0    0     0    44 5075 3453  7  6 87  0  0   
     2  0 1217396 8338468 5911040 47499232    0    0     0     0 5328 3444  7  6 87  0  0   
     2  0 1217396 8207732 5911040 47499232    0    0     0    52 5474 3370  7  6 87  0  0   
     2  0 1217396 8071212 5911040 47499236    0    0     0     0 5442 3419  7  6 87  0  0   
     2  0 1217396 7807736 5911040 47499236    0    0     0     0 6139 3456  7  6 87  0  0   
     2  0 1217396 7676080 5911044 47499232    0    0     0    16 4533 3430  6  6 87  0  0   
     2  0 1217396 7545728 5911044 47499236    0    0     0     0 6712 3957  7  6 87  0  0   
     4  0 1217396 7412444 5911044 47499240    0    0     0    68 6110 3547  7  6 87  0  0   
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     2  0 1217396 7280148 5911048 47499244    0    0     0    68 6140 3516  7  7 86  0  0   
     2  0 1217396 7147836 5911048 47499244    0    0     0     0 4434 3400  7  6 87  0  0   
     2  0 1217396 6886980 5911048 47499248    0    0     0    16 7354 3393  7  6 87  0  0   
     2  0 1217396 6752868 5911048 47499248    0    0     0     0 5286 3573  7  6 87  0  0   
     2  0 1217396 6621772 5911048 47499248    0    0     0     0 5353 3410  7  6 87  0  0   
     2  0 1217396 6489760 5911048 47499252    0    0     0    48 5172 3454  7  6 87  0  0   
     2  0 1217396 6248732 5911048 47499256    0    0     0     0 5266 3411  7  6 87  0  0   
     2  0 1217396 6092804 5911048 47499260    0    0     0     4 6345 3473  7  6 87  0  0   
     2  0 1217396 5962544 5911048 47499260    0    0     0     0 7399 3712  7  6 87  0  0   
     2  0 1217396 5828492 5911048 47499264    0    0     0     0 5804 3516  7  6 87  0  0   
     2  0 1217396 5566720 5911048 47499264    0    0     0    44 5800 3370  7  6 87  0  0   
     2  0 1217396 5434204 5911048 47499264    0    0     0     0 6716 3446  7  6 87  0  0   
     2  0 1217396 5240724 5911048 47499268    0    0     0    68 3948 3346  7  6 87  0  0   
     2  0 1217396 5051688 5911008 47484936    0    0     0     0 4743 3734  7  6 87  0  0   
     2  0 1217396 4925680 5910500 47478444    0    0   136     0 5978 3779  7  6 87  0  0   
     2  0 1217396 4801744 5908552 47471820    0    0     0    32 4573 3237  7  6 87  0  0   
     2  0 1217396 4675772 5908552 47463984    0    0     0     0 6594 3276  7  6 87  0  0   
     2  0 1217396 4486472 5908444 47455736    0    0     0     4 6096 3256  7  6 87  0  0   
     2  0 1217396 4299908 5908392 47446964    0    0     0     0 5569 3525  7  6 87  0  0   
     2  0 1217396 4175444 5906884 47440024    0    0     0     0 4975 3141  7  6 87  0  0   
     2  0 1217396 4063472 5905976 47423860    0    0     0    56 6255 3147  6  6 87  0  0   
     2  0 1217396 3939816 5905796 47415596    0    0     0     0 5396 3143  7  6 87  0  0   
     2  0 1217396 3686540 5905796 47407152    0    0     0    44 6471 3201  7  6 87  0  0   
     2  0 1217396 3557596 5905796 47398892    0    0     0     0 7581 3727  7  6 87  0  0   
     2  0 1217396 3445536 5905796 47381812    0    0     0     0 5560 3222  7  6 87  0  0   
     2  0 1217396 3250272 5905796 47373364    0    0     0    60 5594 3343  7  6 87  0  0   
     2  0 1217396 3065232 5903744 47367156    0    0     0     0 5595 3182  7  6 87  0  0   
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     3  0 1217396 2951704 5903028 47350792    0    0     0    12 5210 3262  7  6 87  0  0   
     2  0 1217396 2829228 5902928 47342444    0    0     0     0 5724 3758  7  6 87  0  0   
     2  0 1217396 2575248 5902580 47334472    0    0     0     0 4377 3369  7  6 87  0  0   
     2  0 1217396 2527996 5897796 47322436    0    0     0    60 5550 3570  7  6 87  0  0   
     2  0 1217396 2398672 5893572 47322324    0    0     0     0 5603 3225  7  6 87  0  0   
     2  0 1217396 2272536 5889364 47322228    0    0     0    16 6924 3310  7  6 87  0  0   
    
    iostat -xyz 1 60
    Linux 2.6.32-642.4.2.el6.x86_64     05/09/2018  _x86_64_    (16 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               6.64    0.00    6.26    0.00    0.00   87.10
    
    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               7.00    0.06    5.69    0.00    0.00   87.24
    
    16.84%  [kernel]                                      [k] compaction_alloc
    
    我设法做到了“sudo perf top”,并在减速发生时在顶行看到了这一点

    vmstat 1 60
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     2  0 1217396 11506356 5911040 47499184    0    2    35    47    0    0 14  2 84  0  0  
     2  0 1217396 11305860 5911040 47499184    4    0     4    36 5163 3460  7  6 87  0  0  
     2  0 1217396 11112744 5911040 47499188    0    0     0     0 4326 3451  7  6 87  0  0  
     2  0 1217396 10980556 5911040 47499188    0    0     0     0 4801 3385  7  6 87  0  0  
     2  0 1217396 10845940 5911040 47499192    0    0     0    20 4650 3596  7  6 87  0  0  
     2  0 1217396 10712508 5911040 47499200    0    0     0     0 5743 3562  7  6 87  0  0  
     2  0 1217396 10583380 5911040 47499200    0    0     0    40 4531 3622  7  6 87  0  0  
     2  0 1217396 10449096 5911040 47499200    0    0     0     0 4516 3629  7  6 87  0  0  
     2  0 1217396 10187856 5911040 47499200    0    0     0     0 4499 3456  7  6 87  0  0  
     2  0 1217396 10053256 5911040 47499204    0    0     0     8 5334 3507  7  6 87  0  0  
     2  0 1217396 9921624 5911040 47499204    0    0     0     0 6310 3593  6  6 87  0  0   
     2  0 1217396 9788532 5911040 47499208    0    0     0    44 5794 3516  7  6 87  0  0   
     2  0 1217396 9660516 5911040 47499208    0    0     0     0 4894 3535  7  6 87  0  0   
     2  0 1217396 9527552 5911040 47499212    0    0     0     0 4686 3570  7  6 87  0  0   
     2  0 1217396 9396536 5911040 47499212    0    0     0     0 4805 3538  7  6 87  0  0   
     2  0 1217396 9238664 5911040 47499212    0    0     0     0 5940 3459  7  6 87  0  0   
     2  0 1217396 9000136 5911040 47499216    0    0     0    32 5239 3333  7  6 87  0  0   
     2  0 1217396 8861132 5911040 47499220    0    0     0     0 5579 3351  7  6 87  0  0   
     2  0 1217396 8733688 5911040 47499220    0    0     0     0 4910 3199  7  6 87  0  0   
     2  0 1217396 8596600 5911040 47499224    0    0     0    44 5075 3453  7  6 87  0  0   
     2  0 1217396 8338468 5911040 47499232    0    0     0     0 5328 3444  7  6 87  0  0   
     2  0 1217396 8207732 5911040 47499232    0    0     0    52 5474 3370  7  6 87  0  0   
     2  0 1217396 8071212 5911040 47499236    0    0     0     0 5442 3419  7  6 87  0  0   
     2  0 1217396 7807736 5911040 47499236    0    0     0     0 6139 3456  7  6 87  0  0   
     2  0 1217396 7676080 5911044 47499232    0    0     0    16 4533 3430  6  6 87  0  0   
     2  0 1217396 7545728 5911044 47499236    0    0     0     0 6712 3957  7  6 87  0  0   
     4  0 1217396 7412444 5911044 47499240    0    0     0    68 6110 3547  7  6 87  0  0   
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     2  0 1217396 7280148 5911048 47499244    0    0     0    68 6140 3516  7  7 86  0  0   
     2  0 1217396 7147836 5911048 47499244    0    0     0     0 4434 3400  7  6 87  0  0   
     2  0 1217396 6886980 5911048 47499248    0    0     0    16 7354 3393  7  6 87  0  0   
     2  0 1217396 6752868 5911048 47499248    0    0     0     0 5286 3573  7  6 87  0  0   
     2  0 1217396 6621772 5911048 47499248    0    0     0     0 5353 3410  7  6 87  0  0   
     2  0 1217396 6489760 5911048 47499252    0    0     0    48 5172 3454  7  6 87  0  0   
     2  0 1217396 6248732 5911048 47499256    0    0     0     0 5266 3411  7  6 87  0  0   
     2  0 1217396 6092804 5911048 47499260    0    0     0     4 6345 3473  7  6 87  0  0   
     2  0 1217396 5962544 5911048 47499260    0    0     0     0 7399 3712  7  6 87  0  0   
     2  0 1217396 5828492 5911048 47499264    0    0     0     0 5804 3516  7  6 87  0  0   
     2  0 1217396 5566720 5911048 47499264    0    0     0    44 5800 3370  7  6 87  0  0   
     2  0 1217396 5434204 5911048 47499264    0    0     0     0 6716 3446  7  6 87  0  0   
     2  0 1217396 5240724 5911048 47499268    0    0     0    68 3948 3346  7  6 87  0  0   
     2  0 1217396 5051688 5911008 47484936    0    0     0     0 4743 3734  7  6 87  0  0   
     2  0 1217396 4925680 5910500 47478444    0    0   136     0 5978 3779  7  6 87  0  0   
     2  0 1217396 4801744 5908552 47471820    0    0     0    32 4573 3237  7  6 87  0  0   
     2  0 1217396 4675772 5908552 47463984    0    0     0     0 6594 3276  7  6 87  0  0   
     2  0 1217396 4486472 5908444 47455736    0    0     0     4 6096 3256  7  6 87  0  0   
     2  0 1217396 4299908 5908392 47446964    0    0     0     0 5569 3525  7  6 87  0  0   
     2  0 1217396 4175444 5906884 47440024    0    0     0     0 4975 3141  7  6 87  0  0   
     2  0 1217396 4063472 5905976 47423860    0    0     0    56 6255 3147  6  6 87  0  0   
     2  0 1217396 3939816 5905796 47415596    0    0     0     0 5396 3143  7  6 87  0  0   
     2  0 1217396 3686540 5905796 47407152    0    0     0    44 6471 3201  7  6 87  0  0   
     2  0 1217396 3557596 5905796 47398892    0    0     0     0 7581 3727  7  6 87  0  0   
     2  0 1217396 3445536 5905796 47381812    0    0     0     0 5560 3222  7  6 87  0  0   
     2  0 1217396 3250272 5905796 47373364    0    0     0    60 5594 3343  7  6 87  0  0   
     2  0 1217396 3065232 5903744 47367156    0    0     0     0 5595 3182  7  6 87  0  0   
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     3  0 1217396 2951704 5903028 47350792    0    0     0    12 5210 3262  7  6 87  0  0   
     2  0 1217396 2829228 5902928 47342444    0    0     0     0 5724 3758  7  6 87  0  0   
     2  0 1217396 2575248 5902580 47334472    0    0     0     0 4377 3369  7  6 87  0  0   
     2  0 1217396 2527996 5897796 47322436    0    0     0    60 5550 3570  7  6 87  0  0   
     2  0 1217396 2398672 5893572 47322324    0    0     0     0 5603 3225  7  6 87  0  0   
     2  0 1217396 2272536 5889364 47322228    0    0     0    16 6924 3310  7  6 87  0  0   
    
    iostat -xyz 1 60
    Linux 2.6.32-642.4.2.el6.x86_64     05/09/2018  _x86_64_    (16 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               6.64    0.00    6.26    0.00    0.00   87.10
    
    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               7.00    0.06    5.69    0.00    0.00   87.24
    
    16.84%  [kernel]                                      [k] compaction_alloc
    
    从上面。其他几个进程正在运行(未显示)

    更新4 关闭THP后,我看到以下内容。在我的程序使用240GB内存(缓存内存<1GB)之前,填充率一直保持在550分钟/秒左右(THP打开时为900分钟)。然后交换开始,填充率下降

    Iteration 610
    Time to malloc: 1.3113e-05
    Time to fill with data: 0.181151
    Fill rate with data: 552.025 Mints/sec, 2208.1Mbytes/sec
    Time to second write access of data: 0.04074
    Access rate of data: 2454.59 Mints/sec, 9818.36Mbytes/sec
    Time to third write access of data: 0.0420492
    Access rate of data: 2378.17 Mints/sec, 9512.67Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 244400Mbytes
    Iteration 611
    Time to malloc: 1.88351e-05
    Time to fill with data: 0.306215
    Fill rate with data: 326.568 Mints/sec, 1306.27Mbytes/sec
    Time to second write access of data: 0.045784
    Access rate of data: 2184.17 Mints/sec, 8736.68Mbytes/sec
    Time to third write access of data: 0.0441492
    Access rate of data: 2265.05 Mints/sec, 9060.19Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 244800Mbytes
    Iteration 612
    Time to malloc: 2.21729e-05
    Time to fill with data: 1.33305
    Fill rate with data: 75.016 Mints/sec, 300.064Mbytes/sec
    Time to second write access of data: 0.048573
    Access rate of data: 2058.76 Mints/sec, 8235.02Mbytes/sec
    Time to third write access of data: 0.0495481
    Access rate of data: 2018.24 Mints/sec, 8072.96Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 245200Mbytes
    
    结论
    关闭透明巨大页面(THP)后,程序的行为对我来说更加透明,因此我将继续关闭THP。对于我的特定程序,原因是THP未交换。感谢所有的帮助。

    测试的第一次迭代可能使用THP:Transparent Hugepage-- 在测试执行期间,检查/sys/kernel/mm/transparent_hugepage/enabled和
    grep AnonHugePages/proc/meminfo

    应用程序运行得更快的原因有两个 因素。第一个因素几乎完全无关,事实并非如此 因为它也会有 需要较大的清除页面复制页面错误,这是 潜在的负面影响。第一个因素是采取行动 userland(so)触及的每个2M虚拟区域的单页错误 将进入/退出内核频率降低512倍)。这 仅在内存的生命周期内第一次访问内存时才起作用 内存映射

    使用
    new
    malloc
    分配大量内存是由单个系统调用
    mmap
    提供的,它们通常不会用物理页“填充”虚拟内存,请检查MADV\u populate:

       MAP_POPULATE (since Linux 2.5.46)
              Populate (prefault) page tables for a mapping. ... This will help
              to reduce blocking on page faults later.
    
    此内存刚刚由mmap注册(不使用MAP_填充),因为页表中禁止虚拟和写访问。当您的测试尝试对任何内存页进行首次写入时,会生成页面错误异常,并由OS内核处理。Linux内核将分配一些物理内存,并将虚拟页面映射到物理页面(填充页面)。启用THP(通常启用THP)后,如果内核有一些空闲的巨大物理页面,它可能会分配单个。若并没有可用的巨大页面,内核将分配4KB的页面。因此,如果没有hugepages,页面错误将增加512倍(可以在测试运行时在另一个控制台中运行
    vmstat 1 180
    ,或者通过
    perf stat-I 1000
    来检查)

    对已填充页面的下一次访问将不会出现页面错误,因此您可以使用(0..N-1)中i的第二(第三)
    扩展测试:a[i]=1循环并测量两个循环的时间

    你的结果听起来还是很奇怪。您的系统是真实的还是虚拟的?虚拟机监控程序可能支持2MB页面,而虚拟系统在内存分配和异常处理方面的成本可能要高得多

    在内存较少的PC上,当页面错误从巨大的页面分配切换到4KB页面分配时,我的速度会降低10%(检查
    perf stat
    中的
    strings-from
    perf stat
    -对于2MB页面,每秒只有大约2000个页面错误,对于4KB页面,每秒有>200000个页面错误):

    在使用我的root命令禁用THP后,每秒总有约20万页错误,大约950 MB/s:

    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    always [madvise] never
    $ perf stat -I1000 ./a.out
    Iteration 0
    Time to malloc: 1.50204e-05
    Time to fill with data: 0.422322
    Fill rate with data: 236.786 Mints/sec, 947.145Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 400Mbytes
    Iteration 1
    Time to malloc: 1.50204e-05
    Time to fill with data: 0.415068
    Fill rate with data: 240.924 Mints/sec, 963.698Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 800Mbytes
    Iteration 2
    Time to malloc: 2.19345e-05
    #           time             counts unit events
         1.000162191         999.429856      task-clock (msec)
         1.000162191                 14      context-switches          #    0.014 K/sec
         1.000162191                  0      cpu-migrations            #    0.000 K/sec
         1.000162191            232,727      page-faults               #    0.233 M/sec
         1.000162191      2,664,896,604      cycles                    #    2.666 GHz
         1.000162191      3,080,713,267      instructions              #    1.16  insn per cycle
         1.000162191        555,116,838      branches                  #  555.434 M/sec
         1.000162191            102,262      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.440695
    Fill rate with data: 226.914 Mints/sec, 907.658Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1200Mbytes
    Iteration 3
    Time to malloc: 2.09808e-05
    Time to fill with data: 0.414463
    Fill rate with data: 241.276 Mints/sec, 965.104Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1600Mbytes
    Iteration 4
    Time to malloc: 1.81198e-05
         2.000544564        1000.142465      task-clock (msec)
         2.000544564                 16      context-switches          #    0.016 K/sec
         2.000544564                  0      cpu-migrations            #    0.000 K/sec
         2.000544564            229,697      page-faults               #    0.230 M/sec
         2.000544564      2,621,180,984      cycles                    #    2.622 GHz
         2.000544564      3,041,358,811      instructions              #    1.15  insn per cycle
         2.000544564        547,910,242      branches                  #  548.027 M/sec
         2.000544564             93,682      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.428383
    Fill rate with data: 233.436 Mints/sec, 933.744Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2000Mbytes
    Iteration 5
    Time to malloc: 1.5974e-05
    Time to fill with data: 0.421986
    Fill rate with data: 236.975 Mints/sec, 947.899Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2400Mbytes
    Iteration 6
    Time to malloc: 1.5974e-05
    Time to fill with data: 0.413477
    Fill rate with data: 241.851 Mints/sec, 967.406Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2800Mbytes
    Iteration 7
    Time to malloc: 1.88351e-05
         3.000866438         999.980461      task-clock (msec)
         3.000866438                 20      context-switches          #    0.020 K/sec
         3.000866438                  0      cpu-migrations            #    0.000 K/sec
         3.000866438            231,194      page-faults               #    0.231 M/sec
         3.000866438      2,622,484,960      cycles                    #    2.623 GHz
         3.000866438      3,061,610,229      instructions              #    1.16  insn per cycle
         3.000866438        551,533,361      branches                  #  551.616 M/sec
         3.000866438            104,561      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.448333
    Fill rate with data: 223.048 Mints/sec, 892.194Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3200Mbytes
    Iteration 8
    Time to malloc: 1.50204e-05
    Time to fill with data: 0.410566
    Fill rate with data: 243.566 Mints/sec, 974.265Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3600Mbytes
    Iteration 9
    Time to malloc: 1.3113e-05
         4.001231042        1000.098860      task-clock (msec)
         4.001231042                 17      context-switches          #    0.017 K/sec
         4.001231042                  0      cpu-migrations            #    0.000 K/sec
         4.001231042            228,532      page-faults               #    0.229 M/sec
         4.001231042      2,586,146,024      cycles                    #    2.586 GHz
         4.001231042      3,026,679,955      instructions              #    1.15  insn per cycle
         4.001231042        545,236,541      branches                  #  545.284 M/sec
         4.001231042            115,251      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.441442
    Fill rate with data: 226.53 Mints/sec, 906.121Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4000Mbytes
    Iteration 10
    Time to malloc: 1.5974e-05
    Time to fill with data: 0.42898
    Fill rate with data: 233.111 Mints/sec, 932.445Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4400Mbytes
    Iteration 11
    Time to malloc: 2.00272e-05
         5.001547227         999.982415      task-clock (msec)
         5.001547227                 19      context-switches          #    0.019 K/sec
         5.001547227                  0      cpu-migrations            #    0.000 K/sec
         5.001547227            225,796      page-faults               #    0.226 M/sec
         5.001547227      2,560,990,918      cycles                    #    2.561 GHz
         5.001547227      3,005,384,743      instructions              #    1.15  insn per cycle
         5.001547227        542,275,580      branches                  #  542.315 M/sec
         5.001547227            116,537      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.414212
    Fill rate with data: 241.422 Mints/sec, 965.689Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4800Mbytes
    Iteration 12
    Time to malloc: 1.69277e-05
    Time to fill with data: 0.411084
    Fill rate with data: 243.259 Mints/sec, 973.037Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5200Mbytes
    Iteration 13
    Time to malloc: 1.40667e-05
    Time to fill with data: 0.413644
    Fill rate with data: 241.754 Mints/sec, 967.015Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5600Mbytes
    Iteration 14
    Time to malloc: 1.28746e-05
         6.001849796         999.913923      task-clock (msec)
         6.001849796                 18      context-switches          #    0.018 K/sec
         6.001849796                  0      cpu-migrations            #    0.000 K/sec
         6.001849796            236,912      page-faults               #    0.237 M/sec
         6.001849796      2,685,445,660      cycles                    #    2.686 GHz
         6.001849796      3,153,464,551      instructions              #    1.20  insn per cycle
         6.001849796        568,989,467      branches                  #  569.032 M/sec
         6.001849796            125,943      branch-misses             #    0.02% of all branches
    Time to fill with data: 0.444891
    Fill rate with data: 224.774 Mints/sec, 899.097Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 6000Mbytes
    
    针对具有速率打印和有限迭代计数的性能统计修改测试:

    $ cat test.c; g++ test.c
    #include <sys/time.h>
    #include <time.h>
    #include <stdio.h>
    #include <string.h>
    #include <iostream>
    #include <vector>
    using namespace std;
    
    double getWallTime()
    {
      struct timeval time;
      if (gettimeofday(&time, NULL))
      {
        return 0;
      }
      return (double)time.tv_sec + (double)time.tv_usec * .000001;
    }
    
    #define M 1000000
    
    int main()
    {
      int *a;
      int n = 100000000;
      int j;
      double total = 0;
      for(j=0; j<15; j++)
      {
        cout << "Iteration " << j << endl;
        double start = getWallTime();
        a = new int[n];
        cout << "Time to malloc: " << getWallTime() - start << endl;
        for (int i = 0; i < n; i++)
        {
          a[i] = 1;
        }
        double elapsed = getWallTime()-start;
        cout << "Time to fill with data: " << elapsed << endl;
        cout << "Fill rate with data: " << n/elapsed/M << " Mints/sec, " << n*sizeof(int)/elapsed/M << "Mbytes/sec"  << endl;
        total += n*sizeof(int)*1./M;
        cout << "Allocated " << n*sizeof(int)*1./M << " Mbytes, with total memory allocated " << total << "Mbytes" << endl;
      }
    
      return 0;
    }
    
    THP-分配速度快一点,但第二次和第三次访问速度相同:

    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    always [madvise] never
    $ ./second
    Iteration 0
    Time to malloc: 9.05991e-06
    Time to fill with data: 0.426387
    Fill rate with data: 234.529 Mints/sec, 938.115Mbytes/sec
    Time to second write access of data: 0.318292
    Access rate of data: 314.177 Mints/sec, 1256.71Mbytes/sec
    Time to third write access of data: 0.321722
    Access rate of data: 310.827 Mints/sec, 1243.31Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 400Mbytes
    Iteration 1
    Time to malloc: 3.50475e-05
    Time to fill with data: 0.411859
    Fill rate with data: 242.802 Mints/sec, 971.206Mbytes/sec
    Time to second write access of data: 0.317989
    Access rate of data: 314.476 Mints/sec, 1257.91Mbytes/sec
    Time to third write access of data: 0.321637
    Access rate of data: 310.91 Mints/sec, 1243.64Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 800Mbytes
    Iteration 2
    Time to malloc: 2.81334e-05
    Time to fill with data: 0.411918
    Fill rate with data: 242.767 Mints/sec, 971.067Mbytes/sec
    Time to second write access of data: 0.318647
    Access rate of data: 313.827 Mints/sec, 1255.31Mbytes/sec
    Time to third write access of data: 0.321041
    Access rate of data: 311.487 Mints/sec, 1245.95Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1200Mbytes
    Iteration 3
    Time to malloc: 2.5034e-05
    Time to fill with data: 0.411138
    Fill rate with data: 243.227 Mints/sec, 972.909Mbytes/sec
    Time to second write access of data: 0.318429
    Access rate of data: 314.042 Mints/sec, 1256.17Mbytes/sec
    Time to third write access of data: 0.321332
    Access rate of data: 311.205 Mints/sec, 1244.82Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1600Mbytes
    Iteration 4
    Time to malloc: 3.71933e-05
    Time to fill with data: 0.410922
    Fill rate with data: 243.355 Mints/sec, 973.421Mbytes/sec
    Time to second write access of data: 0.320262
    Access rate of data: 312.244 Mints/sec, 1248.98Mbytes/sec
    Time to third write access of data: 0.319223
    Access rate of data: 313.261 Mints/sec, 1253.04Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2000Mbytes
    Iteration 5
    Time to malloc: 2.19345e-05
    Time to fill with data: 0.418508
    Fill rate with data: 238.944 Mints/sec, 955.777Mbytes/sec
    Time to second write access of data: 0.320419
    Access rate of data: 312.092 Mints/sec, 1248.37Mbytes/sec
    Time to third write access of data: 0.319752
    Access rate of data: 312.742 Mints/sec, 1250.97Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2400Mbytes
    Iteration 6
    Time to malloc: 3.19481e-05
    Time to fill with data: 0.410054
    Fill rate with data: 243.87 Mints/sec, 975.481Mbytes/sec
    Time to second write access of data: 0.320244
    Access rate of data: 312.262 Mints/sec, 1249.05Mbytes/sec
    Time to third write access of data: 0.319546
    Access rate of data: 312.944 Mints/sec, 1251.78Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2800Mbytes
    Iteration 7
    Time to malloc: 3.19481e-05
    Time to fill with data: 0.409491
    Fill rate with data: 244.206 Mints/sec, 976.822Mbytes/sec
    Time to second write access of data: 0.318501
    Access rate of data: 313.971 Mints/sec, 1255.88Mbytes/sec
    Time to third write access of data: 0.320052
    Access rate of data: 312.449 Mints/sec, 1249.8Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3200Mbytes
    Iteration 8
    Time to malloc: 2.5034e-05
    Time to fill with data: 0.409922
    Fill rate with data: 243.949 Mints/sec, 975.795Mbytes/sec
    Time to second write access of data: 0.320583
    Access rate of data: 311.932 Mints/sec, 1247.73Mbytes/sec
    Time to third write access of data: 0.319478
    Access rate of data: 313.011 Mints/sec, 1252.04Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3600Mbytes
    Iteration 9
    Time to malloc: 2.69413e-05
    Time to fill with data: 0.41104
    Fill rate with data: 243.285 Mints/sec, 973.141Mbytes/sec
    Time to second write access of data: 0.320389
    Access rate of data: 312.121 Mints/sec, 1248.48Mbytes/sec
    Time to third write access of data: 0.319762
    Access rate of data: 312.733 Mints/sec, 1250.93Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4000Mbytes
    Iteration 10
    Time to malloc: 2.59876e-05
    Time to fill with data: 0.412612
    Fill rate with data: 242.358 Mints/sec, 969.434Mbytes/sec
    Time to second write access of data: 0.318304
    Access rate of data: 314.165 Mints/sec, 1256.66Mbytes/sec
    Time to third write access of data: 0.319453
    Access rate of data: 313.035 Mints/sec, 1252.14Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4400Mbytes
    Iteration 11
    Time to malloc: 2.98023e-05
    Time to fill with data: 0.412428
    Fill rate with data: 242.467 Mints/sec, 969.866Mbytes/sec
    Time to second write access of data: 0.318467
    Access rate of data: 314.004 Mints/sec, 1256.02Mbytes/sec
    Time to third write access of data: 0.319716
    Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4800Mbytes
    Iteration 12
    Time to malloc: 2.69413e-05
    Time to fill with data: 0.410515
    Fill rate with data: 243.597 Mints/sec, 974.386Mbytes/sec
    Time to second write access of data: 0.31832
    Access rate of data: 314.149 Mints/sec, 1256.6Mbytes/sec
    Time to third write access of data: 0.319569
    Access rate of data: 312.921 Mints/sec, 1251.69Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5200Mbytes
    Iteration 13
    Time to malloc: 2.28882e-05
    Time to fill with data: 0.412385
    Fill rate with data: 242.492 Mints/sec, 969.967Mbytes/sec
    Time to second write access of data: 0.318929
    Access rate of data: 313.549 Mints/sec, 1254.2Mbytes/sec
    Time to third write access of data: 0.31949
    Access rate of data: 312.999 Mints/sec, 1252Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5600Mbytes
    Iteration 14
    Time to malloc: 2.90871e-05
    Time to fill with data: 0.41235
    Fill rate with data: 242.512 Mints/sec, 970.05Mbytes/sec
    Time to second write access of data: 0.340456
    Access rate of data: 293.724 Mints/sec, 1174.89Mbytes/sec
    Time to third write access of data: 0.319716
    Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 6000Mbytes
    
    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    [always] madvise never
    $ ./second
    Iteration 0
    Time to malloc: 1.50204e-05
    Time to fill with data: 0.365043
    Fill rate with data: 273.94 Mints/sec, 1095.76Mbytes/sec
    Time to second write access of data: 0.320503
    Access rate of data: 312.01 Mints/sec, 1248.04Mbytes/sec
    Time to third write access of data: 0.319442
    Access rate of data: 313.046 Mints/sec, 1252.18Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 400Mbytes
    ...
    Iteration 14
    Time to malloc: 2.7895e-05
    Time to fill with data: 0.409294
    Fill rate with data: 244.323 Mints/sec, 977.293Mbytes/sec
    Time to second write access of data: 0.318422
    Access rate of data: 314.049 Mints/sec, 1256.19Mbytes/sec
    Time to third write access of data: 0.322098
    Access rate of data: 310.465 Mints/sec, 1241.86Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 6000Mbytes
    

    从更新和聊天中:

    当速度减慢时,我确实看到内核从2MB页面切换到4KB页面

    我设法做到了“sudo perf top”,并在减速发生时在顶行看到了这一点

    16.84%[内核][k]压实度

    性能顶级-g

    - 31.27% 31.03% [kernel] [k] compaction_alloc \u2592 
    - compaction_alloc \u2592 
    - migrate_pages \u2592 
    compact_zone \u2592 
    compact_zone_order \u2592 
    try_to_compact_pages \u2592 
    __alloc_pages_direct_compact \u2592 
    __alloc_pages_nodemask \u2592 
    alloc_pages_vma \u2592 
    do_huge_pmd_anonymous_page \u2592 
    handle_mm_fault \u2592 
    __do_page_fault \u2592 
    do_page_fault \u2592 
    page_fault
    
    减速与启用的THP和4KB的慢速页面错误有关。在4KB的切换之后,一些linux内核内部压缩机制的页面错误非常缓慢(内核是否仍在尝试获取更大的页面?)。更多问题来自NUMA上的THP,包括THP和NUMA代码

    最初的问题是 我们根据用户设置的内存同时启动多个解算器。在这种情况下,用户可能希望使用所有230G空闲RAM。 我们进行动态内存分配/释放。当我们达到内存限制时,在这种情况下,可能是150GB(而不是230GB),我们会看到急剧的减速。 我观察到高系统cpu使用率和交换使用率。所以我编写了这个小程序,它似乎显示了我原来的问题

    我可以建议你
    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    always [madvise] never
    $ ./second
    Iteration 0
    Time to malloc: 9.05991e-06
    Time to fill with data: 0.426387
    Fill rate with data: 234.529 Mints/sec, 938.115Mbytes/sec
    Time to second write access of data: 0.318292
    Access rate of data: 314.177 Mints/sec, 1256.71Mbytes/sec
    Time to third write access of data: 0.321722
    Access rate of data: 310.827 Mints/sec, 1243.31Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 400Mbytes
    Iteration 1
    Time to malloc: 3.50475e-05
    Time to fill with data: 0.411859
    Fill rate with data: 242.802 Mints/sec, 971.206Mbytes/sec
    Time to second write access of data: 0.317989
    Access rate of data: 314.476 Mints/sec, 1257.91Mbytes/sec
    Time to third write access of data: 0.321637
    Access rate of data: 310.91 Mints/sec, 1243.64Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 800Mbytes
    Iteration 2
    Time to malloc: 2.81334e-05
    Time to fill with data: 0.411918
    Fill rate with data: 242.767 Mints/sec, 971.067Mbytes/sec
    Time to second write access of data: 0.318647
    Access rate of data: 313.827 Mints/sec, 1255.31Mbytes/sec
    Time to third write access of data: 0.321041
    Access rate of data: 311.487 Mints/sec, 1245.95Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1200Mbytes
    Iteration 3
    Time to malloc: 2.5034e-05
    Time to fill with data: 0.411138
    Fill rate with data: 243.227 Mints/sec, 972.909Mbytes/sec
    Time to second write access of data: 0.318429
    Access rate of data: 314.042 Mints/sec, 1256.17Mbytes/sec
    Time to third write access of data: 0.321332
    Access rate of data: 311.205 Mints/sec, 1244.82Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 1600Mbytes
    Iteration 4
    Time to malloc: 3.71933e-05
    Time to fill with data: 0.410922
    Fill rate with data: 243.355 Mints/sec, 973.421Mbytes/sec
    Time to second write access of data: 0.320262
    Access rate of data: 312.244 Mints/sec, 1248.98Mbytes/sec
    Time to third write access of data: 0.319223
    Access rate of data: 313.261 Mints/sec, 1253.04Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2000Mbytes
    Iteration 5
    Time to malloc: 2.19345e-05
    Time to fill with data: 0.418508
    Fill rate with data: 238.944 Mints/sec, 955.777Mbytes/sec
    Time to second write access of data: 0.320419
    Access rate of data: 312.092 Mints/sec, 1248.37Mbytes/sec
    Time to third write access of data: 0.319752
    Access rate of data: 312.742 Mints/sec, 1250.97Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2400Mbytes
    Iteration 6
    Time to malloc: 3.19481e-05
    Time to fill with data: 0.410054
    Fill rate with data: 243.87 Mints/sec, 975.481Mbytes/sec
    Time to second write access of data: 0.320244
    Access rate of data: 312.262 Mints/sec, 1249.05Mbytes/sec
    Time to third write access of data: 0.319546
    Access rate of data: 312.944 Mints/sec, 1251.78Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 2800Mbytes
    Iteration 7
    Time to malloc: 3.19481e-05
    Time to fill with data: 0.409491
    Fill rate with data: 244.206 Mints/sec, 976.822Mbytes/sec
    Time to second write access of data: 0.318501
    Access rate of data: 313.971 Mints/sec, 1255.88Mbytes/sec
    Time to third write access of data: 0.320052
    Access rate of data: 312.449 Mints/sec, 1249.8Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3200Mbytes
    Iteration 8
    Time to malloc: 2.5034e-05
    Time to fill with data: 0.409922
    Fill rate with data: 243.949 Mints/sec, 975.795Mbytes/sec
    Time to second write access of data: 0.320583
    Access rate of data: 311.932 Mints/sec, 1247.73Mbytes/sec
    Time to third write access of data: 0.319478
    Access rate of data: 313.011 Mints/sec, 1252.04Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 3600Mbytes
    Iteration 9
    Time to malloc: 2.69413e-05
    Time to fill with data: 0.41104
    Fill rate with data: 243.285 Mints/sec, 973.141Mbytes/sec
    Time to second write access of data: 0.320389
    Access rate of data: 312.121 Mints/sec, 1248.48Mbytes/sec
    Time to third write access of data: 0.319762
    Access rate of data: 312.733 Mints/sec, 1250.93Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4000Mbytes
    Iteration 10
    Time to malloc: 2.59876e-05
    Time to fill with data: 0.412612
    Fill rate with data: 242.358 Mints/sec, 969.434Mbytes/sec
    Time to second write access of data: 0.318304
    Access rate of data: 314.165 Mints/sec, 1256.66Mbytes/sec
    Time to third write access of data: 0.319453
    Access rate of data: 313.035 Mints/sec, 1252.14Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4400Mbytes
    Iteration 11
    Time to malloc: 2.98023e-05
    Time to fill with data: 0.412428
    Fill rate with data: 242.467 Mints/sec, 969.866Mbytes/sec
    Time to second write access of data: 0.318467
    Access rate of data: 314.004 Mints/sec, 1256.02Mbytes/sec
    Time to third write access of data: 0.319716
    Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 4800Mbytes
    Iteration 12
    Time to malloc: 2.69413e-05
    Time to fill with data: 0.410515
    Fill rate with data: 243.597 Mints/sec, 974.386Mbytes/sec
    Time to second write access of data: 0.31832
    Access rate of data: 314.149 Mints/sec, 1256.6Mbytes/sec
    Time to third write access of data: 0.319569
    Access rate of data: 312.921 Mints/sec, 1251.69Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5200Mbytes
    Iteration 13
    Time to malloc: 2.28882e-05
    Time to fill with data: 0.412385
    Fill rate with data: 242.492 Mints/sec, 969.967Mbytes/sec
    Time to second write access of data: 0.318929
    Access rate of data: 313.549 Mints/sec, 1254.2Mbytes/sec
    Time to third write access of data: 0.31949
    Access rate of data: 312.999 Mints/sec, 1252Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 5600Mbytes
    Iteration 14
    Time to malloc: 2.90871e-05
    Time to fill with data: 0.41235
    Fill rate with data: 242.512 Mints/sec, 970.05Mbytes/sec
    Time to second write access of data: 0.340456
    Access rate of data: 293.724 Mints/sec, 1174.89Mbytes/sec
    Time to third write access of data: 0.319716
    Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 6000Mbytes
    
    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    [always] madvise never
    $ ./second
    Iteration 0
    Time to malloc: 1.50204e-05
    Time to fill with data: 0.365043
    Fill rate with data: 273.94 Mints/sec, 1095.76Mbytes/sec
    Time to second write access of data: 0.320503
    Access rate of data: 312.01 Mints/sec, 1248.04Mbytes/sec
    Time to third write access of data: 0.319442
    Access rate of data: 313.046 Mints/sec, 1252.18Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 400Mbytes
    ...
    Iteration 14
    Time to malloc: 2.7895e-05
    Time to fill with data: 0.409294
    Fill rate with data: 244.323 Mints/sec, 977.293Mbytes/sec
    Time to second write access of data: 0.318422
    Access rate of data: 314.049 Mints/sec, 1256.19Mbytes/sec
    Time to third write access of data: 0.322098
    Access rate of data: 310.465 Mints/sec, 1241.86Mbytes/sec
    Allocated 400 Mbytes, with total memory allocated 6000Mbytes
    
    - 31.27% 31.03% [kernel] [k] compaction_alloc \u2592 
    - compaction_alloc \u2592 
    - migrate_pages \u2592 
    compact_zone \u2592 
    compact_zone_order \u2592 
    try_to_compact_pages \u2592 
    __alloc_pages_direct_compact \u2592 
    __alloc_pages_nodemask \u2592 
    alloc_pages_vma \u2592 
    do_huge_pmd_anonymous_page \u2592 
    handle_mm_fault \u2592 
    __do_page_fault \u2592 
    do_page_fault \u2592 
    page_fault