Google compute engine 本地ssd Raid mdadm性能

Google compute engine 本地ssd Raid mdadm性能,google-compute-engine,Google Compute Engine,也许你可以帮我一个忙,我有点迷路了,我试图获得一些额外的IOPs和吞吐量,将8个本地SSD放在RAID(mdadm)上,但我无法获得高于单个本地SSD驱动器平均性能的性能。我正在使用fio运行测试,目前为止最好的raid配置是raid 0,它显示的性能几乎与单个驱动器相同 我还尝试了RAID 5,它看起来性能下降(特别是在写操作上),因为RAID的损失。因此,看起来实例上有一个IO限制,无论我在其上放置了多少本地SSD,我都会得到相同的统计数据,如果RAID配置受到惩罚,我会得到更低的统计数据,

也许你可以帮我一个忙,我有点迷路了,我试图获得一些额外的IOPs和吞吐量,将8个本地SSD放在RAID(mdadm)上,但我无法获得高于单个本地SSD驱动器平均性能的性能。我正在使用fio运行测试,目前为止最好的raid配置是raid 0,它显示的性能几乎与单个驱动器相同

我还尝试了RAID 5,它看起来性能下降(特别是在写操作上),因为RAID的损失。因此,看起来实例上有一个IO限制,无论我在其上放置了多少本地SSD,我都会得到相同的统计数据,如果RAID配置受到惩罚,我会得到更低的统计数据,有什么想法吗?实例是否有IO限制(对于测试,我在RAM上使用32个vcores CentOS/Debian和208Gb

一些输出:

RAID 0:

root@sgs02 data]# /usr/local/bin/fio --ioengine=libaio --direct=1
--name=test --filename=/data/t22 --bs=4k --iodep th=128 --size=4G --randrepeat=1 --readwrite=randrw --rwmixread=70 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 fio-2.13 Starting 1 process Jobs: 1 (f=1): [m(1)] [100.0% done] [313.8MB/132.6MB/0KB /s] [80.4K/33.1K/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=13748: Mon Aug 29 07:19:10 2016   read : io=2865.9MB, bw=323120KB/s, iops=80780, runt=  9082msec
    slat (usec): min=2, max=167, avg= 7.22, stdev= 4.42
    clat (usec): min=125, max=9876, avg=1128.41, stdev=298.77
     lat (usec): min=132, max=9881, avg=1135.63, stdev=298.82
    clat percentiles (usec):
     |  1.00th=[  908],  5.00th=[  956], 10.00th=[  980], 20.00th=[ 1012],
     | 30.00th=[ 1032], 40.00th=[ 1064], 50.00th=[ 1080], 60.00th=[ 1112],
     | 70.00th=[ 1128], 80.00th=[ 1160], 90.00th=[ 1208], 95.00th=[ 1288],
     | 99.00th=[ 2928], 99.50th=[ 3472], 99.90th=[ 3920], 99.95th=[ 4320],
     | 99.99th=[ 5088]   write: io=1230.3MB, bw=138706KB/s, iops=34676, runt=  9082msec
    slat (usec): min=2, max=116, avg= 8.05, stdev= 4.64
    clat (usec): min=87, max=8943, avg=1034.94, stdev=146.07
     lat (usec): min=94, max=8947, avg=1042.99, stdev=146.39
    clat percentiles (usec):
     |  1.00th=[  852],  5.00th=[  900], 10.00th=[  924], 20.00th=[  964],
     | 30.00th=[  988], 40.00th=[ 1004], 50.00th=[ 1032], 60.00th=[ 1048],
     | 70.00th=[ 1064], 80.00th=[ 1096], 90.00th=[ 1144], 95.00th=[ 1176],
     | 99.00th=[ 1272], 99.50th=[ 1320], 99.90th=[ 3184], 99.95th=[ 3984],
     | 99.99th=[ 5856]
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.02%, 1000=21.69%
    lat (msec) : 2=76.85%, 4=1.35%, 10=0.07%
RAID 5:

/usr/local/bin/fio --ioengine=libaio --direct=1 --name=test
--filename=/data/t22 --bs=4k --iodepth=128 --size=4G --randrepeat=1 --readwrite=ra ndrw --rwmixread=70 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 fio-2.13 Starting 1 process Jobs: 1 (f=1): [m(1)] [100.0% done] [248.7MB/104.6MB/0KB /s] [63.7K/26.8K/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=10743: Mon Aug 29 07:19:10 2016   read : io=2865.9MB, bw=235218KB/s, iops=58804, runt= 12476msec
    slat (usec): min=2, max=402, avg= 9.43, stdev= 8.99
    clat (usec): min=100, max=15268, avg=1155.38, stdev=790.57
     lat (usec): min=108, max=15271, avg=1164.81, stdev=792.64
    clat percentiles (usec):
     |  1.00th=[  119],  5.00th=[  137], 10.00th=[  155], 20.00th=[  195],
     | 30.00th=[  286], 40.00th=[ 1320], 50.00th=[ 1400], 60.00th=[ 1464],
     | 70.00th=[ 1512], 80.00th=[ 1624], 90.00th=[ 1928], 95.00th=[ 2096],
     | 99.00th=[ 3440], 99.50th=[ 3952], 99.90th=[ 5600], 99.95th=[ 6432],
     | 99.99th=[14656]   write: io=1230.3MB, bw=100972KB/s, iops=25243, runt= 12476msec
    slat (usec): min=2, max=13853, avg= 9.33, stdev=29.91
    clat (usec): min=212, max=424477, avg=2341.19, stdev=1500.44
     lat (usec): min=217, max=424481, avg=2350.52, stdev=1499.50
    clat percentiles (usec):
     |  1.00th=[ 1336],  5.00th=[ 1448], 10.00th=[ 1512], 20.00th=[ 1592],
     | 30.00th=[ 1656], 40.00th=[ 1752], 50.00th=[ 2024], 60.00th=[ 2288],
     | 70.00th=[ 2704], 80.00th=[ 3056], 90.00th=[ 3568], 95.00th=[ 3984],
     | 99.00th=[ 5280], 99.50th=[ 6048], 99.90th=[ 8256], 99.95th=[15168],
     | 99.99th=[20352]
    lat (usec) : 250=19.32%, 500=3.46%, 750=0.40%, 1000=0.34%
    lat (msec) : 2=56.10%, 4=18.56%, 10=1.79%, 20=0.02%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%   cpu          : usr=7.26%, sys=87.08%, ctx=43414, majf=0, minf=33
单驱动器

/usr/local/bin/fio --ioengine=libaio --direct=1 --name=test
--filename=/data/t22 --bs=4k --iode pth=128 --size=4G --randrepeat=1 --readwrite=randrw --rwmixread=70 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 fio-2.13 Starting 1 process Jobs: 1 (f=1): [m(1)] [100.0% done] [328.1MB/138.5MB/0KB /s] [83.1K/35.5K/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=4554: Mon Aug 29 07:19:19 2016   read : io=2865.9MB, bw=338163KB/s, iops=84540, runt=  8678msec
    slat (usec): min=1, max=199, avg= 3.41, stdev= 4.70
    clat (usec): min=98, max=19530, avg=1226.75, stdev=1142.24
     lat (usec): min=103, max=19532, avg=1230.16, stdev=1142.32
    clat percentiles (usec):
     |  1.00th=[  153],  5.00th=[  243], 10.00th=[  302], 20.00th=[  386],
     | 30.00th=[  458], 40.00th=[  548], 50.00th=[  676], 60.00th=[  980],
     | 70.00th=[ 1480], 80.00th=[ 2160], 90.00th=[ 2928], 95.00th=[ 3600],
     | 99.00th=[ 4768], 99.50th=[ 5216], 99.90th=[ 5856], 99.95th=[ 6688],
     | 99.99th=[15936]   write: io=1230.3MB, bw=145163KB/s, iops=36290, runt=  8678msec
    slat (usec): min=1, max=196, avg= 4.14, stdev= 5.17
    clat (usec): min=29, max=16473, avg=654.72, stdev=732.68
     lat (usec): min=45, max=16476, avg=658.86, stdev=732.68
    clat percentiles (usec):
     |  1.00th=[   59],  5.00th=[   88], 10.00th=[  123], 20.00th=[  193],

您可能遇到了mdadm中的瓶颈。请签出关于为某些操作使用单个内核线程的线程。虽然这涉及RAID 5,但大量代码被重用,因此RAID 0中可能存在类似的问题。他们正在努力解决此瓶颈。

我投票将此问题作为主题外的问题结束,因为它是与编程无关(但对于超级用户或服务器故障来说可能也太宽)。