Arrays io_提交等待所有oracle dbwriter I/O

Arrays io_提交等待所有oracle dbwriter I/O,arrays,linux,oracle,aio,Arrays,Linux,Oracle,Aio,作为背景,我自80年代以来一直在调整数据库平台。因此,我在过去处理过很多异步I/O问题。这是新的,而且很奇怪 首先,我在RHEL 7.1 64位(3.10.0-229)上使用带ASM的Oracle 12c。我一直在使用两个EMC CX4-960阵列,总共有72个SSD。我每秒总共执行约105K次读取,65K次写入。(是的,这是一个非常强大的存储后端!)磁盘写入延迟为2-3毫秒。当oracle DBWriter刷新缓冲区(通常是大批量和异步)时,下面的strace片段显示io_submit()和i

作为背景,我自80年代以来一直在调整数据库平台。因此,我在过去处理过很多异步I/O问题。这是新的,而且很奇怪

首先,我在RHEL 7.1 64位(3.10.0-229)上使用带ASM的Oracle 12c。我一直在使用两个EMC CX4-960阵列,总共有72个SSD。我每秒总共执行约105K次读取,65K次写入。(是的,这是一个非常强大的存储后端!)磁盘写入延迟为2-3毫秒。当oracle DBWriter刷新缓冲区(通常是大批量和异步)时,下面的strace片段显示io_submit()和io_getevents()在几毫秒内完成,然后所有写入操作都需要几毫秒才能完成,我们将移到下一批。(我去掉了io_submit()行中提交的块的详细信息:

294692 12:46:10.173955 io_提交(14066213666720301,)=301
294692 12:46:10.178452 io_getevents(14066213666720,38128,{600,0})=60
294692 12:46:10.178766次(空)=43901459
294692 12:46:10.178845 io_getevents(14066213666720128128,{0,0})=85
294692 12:46:10.179352 io_getevents(14066213666720128128,{0,0})=62
294692 12:46:10.180207 io_getevents(1406621366672094128,{0,0})=76
294692 12:46:10.180743 io_getevents(1406621366606720,18,128,{0,0})=16
294692 12:46:10.181994 io_getevents(14066213666720,2128,{0,0})=2
294692 12:46:10.182393次(空)=43901459
294692 12:46:10.182462 semtimedop(4718593,1,{3,0})=-1 EAGAIN(资源暂时不可用)
294692 12:46:13.182193次(空)=439014659
294692 12:46:13.188183 io_提交(14066213666720319,)=319
294692 12:46:13.193078 io_getevents(14066213666720,40128,{600,0})=128
294692 12:46:13.193583次(空)=439014660
294692 12:46:13.193663 io_getevents(14066213666720128128,{0,0})=119
294692 12:46:13.194364 io_getevents(1406621366672072128,{0,0})=59
294692 12:46:13.195876 io_getevents(1406621366606720,13128,{0,0})=13
294692 12:46:13.196650次(空)=439014661
294692 12:46:13.196725 semtimedop(4718593,1,{29900000000})=-1eagain(资源暂时不可用)
294692 12:46:16.186196次(空)=439014960
294692 12:46:16.194006 io_提交(14066213666720276,)=276
294692 12:46:16.198285 io_getevents(1406621366606720,36,128,{600,0})=42
294692 12:46:16.198518次(空)=439014961
294692 12:46:16.198572 io_getevents(14066213666720128128,{0,0})=48
294692 12:46:16.198893 io_getevents(14066213666720128128,{0,0})=37
到目前为止,还不错。然后我切换到两个正在测试的Tegile t3600阵列。这些阵列速度更快,可以在更低的延迟下为我提供更多的IOPS。问题是我很快就遇到了Oracle的“空闲缓冲区等待”50%或更高。DBwriter无法跟上,强制前台写入和各种不好的事情。令人惊讶的是,DBwriter无法用如此快的存储刷新足够的缓冲区。但strace说明了原因。请注意,iostat显示平均磁盘写入延迟约为0.7ms

19131 18:35:06.903628 io_submit(140538814074880, 517, ) = 517 <0.505505>
19131 18:35:07.414281 io_getevents(140538814074880, 40, 128, , {600, 0}) = 128 <0.000014>
19131 18:35:07.415091 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000012>
19131 18:35:07.416139 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000010>
19131 18:35:07.417134 semctl(753668, 33, SETVAL, 0x1) = 0 <0.000017>
19131 18:35:07.417553 semctl(688130, 103, SETVAL, 0x1) = 0 <0.000014>
19131 18:35:07.417640 semctl(655361, 130, SETVAL, 0x1) = 0 <0.000013>
19131 18:35:07.419923 io_submit(140538814074880, 248, ) = 248 <0.250174>
19131 18:35:07.673864 io_getevents(140538814074880, 22, 128, , {600, 0}) = 128 <0.000019>
19131 18:35:07.674735 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000010>
19131 18:35:07.676021 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000020>
19131 18:35:07.676660 semctl(753668, 5, SETVAL, 0x1) = 0 <0.000021>
19131 18:35:07.680954 io_submit(140538814074880, 507, ) = 507 <0.503491>
19131 18:35:08.190096 io_getevents(140538814074880, 38, 128, , {600, 0}) = 128 <0.000010>
19131 18:35:08.190617 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000008>
19131 18:35:08.193571 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000025>
19131 18:35:08.196128 semctl(720899, 38, SETVAL, 0x1) = 0 <0.000026>
1913118:35:06.903628 io_提交(140538814074880517,)=517
1913118:35:07.414281 io_getevents(140538814074880,40128,{600,0})=128
1913118:35:07.415091 io_getevents(140538814074880128128,{0,0})=128
1913118:35:07.416139 io_getevents(140538814074880128128,{0,0})=128
19131 18:35:07.417134 semctl(753668,33,SETVAL,0x1)=0
1913118:35:07.417553 semctl(688130103,SETVAL,0x1)=0
1913118:35:07.417640 semctl(655361130,SETVAL,0x1)=0
1913118:35:07.419923 io_提交(140538814074880248,)=248
1913118:35:07.673864 io_getevents(140538814074880,22128,{600,0})=128
1913118:35:07.674735 io_getevents(140538814074880128128,{0,0})=128
1913118:35:07.676021 io_getevents(140538814074880128128,{0,0})=128
19131 18:35:07.676660 semctl(753668,5,SETVAL,0x1)=0
1913118:35:07.680954 io_提交(140538814074880507,)=507
1913118:35:08.190096 io_getevents(140538814074880,38128,{600,0})=128
1913118:35:08.190617 io_getevents(140538814074880128128,{0,0})=128
1913118:35:08.193571 io_getevents(140538814074880128128,{0,0})=128
19131 18:35:08.196128 semctl(720899,38,SETVAL,0x1)=0
因此,出于某种原因,带有517个块的io_submit()需要505ms才能返回。为什么


知道为什么会发生这种情况吗?似乎阵列以某种方式告诉操作系统以串行方式发出写操作。FWIW,我甚至在阵列控制器中启用了写操作的回写缓存。因此,这似乎是操作系统本身的问题。问题是,当Linux扫描LUN时,LUN会使用“已启用写缓存”。这告诉Linux它必须使用强制单元访问,以避免在缓存断电的情况下丢失数据,因为Oracle使用O_SYNC(或O_DSYNC?)打开LUN.这是基于许多假设的-缓存在RAM中,不稳定,等等-但让我们接受这一点。就性能而言,FUA是个坏消息。它还击败了异步I/O的并行发布


结果表明,该阵列有一个设置,告诉它是否向Linux服务器播发回写缓存。它不会更改阵列的操作方式,只会更改它在主机上的显示方式。通过将阵列上的WBC设置更改为禁用,Linux主机打印行“已禁用回写缓存”当它扫描LUN时,现在异步写入行为正常。

这是一个有趣的问题,但超出了我的经验。我不确定,但可能更适合?祝你好运。可能是离题:我记得在AIX上有一个可调的参数队列深度。提交的请求是否可能超过硬件支持的数量?我担心您的问题非常特殊,下一步就是将printk添加到内核驱动程序源代码中。
19131 18:35:06.903628 io_submit(140538814074880, 517, ) = 517 <0.505505>
19131 18:35:07.414281 io_getevents(140538814074880, 40, 128, , {600, 0}) = 128 <0.000014>
19131 18:35:07.415091 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000012>
19131 18:35:07.416139 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000010>
19131 18:35:07.417134 semctl(753668, 33, SETVAL, 0x1) = 0 <0.000017>
19131 18:35:07.417553 semctl(688130, 103, SETVAL, 0x1) = 0 <0.000014>
19131 18:35:07.417640 semctl(655361, 130, SETVAL, 0x1) = 0 <0.000013>
19131 18:35:07.419923 io_submit(140538814074880, 248, ) = 248 <0.250174>
19131 18:35:07.673864 io_getevents(140538814074880, 22, 128, , {600, 0}) = 128 <0.000019>
19131 18:35:07.674735 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000010>
19131 18:35:07.676021 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000020>
19131 18:35:07.676660 semctl(753668, 5, SETVAL, 0x1) = 0 <0.000021>
19131 18:35:07.680954 io_submit(140538814074880, 507, ) = 507 <0.503491>
19131 18:35:08.190096 io_getevents(140538814074880, 38, 128, , {600, 0}) = 128 <0.000010>
19131 18:35:08.190617 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000008>
19131 18:35:08.193571 io_getevents(140538814074880, 128, 128, , {0, 0}) = 128 <0.000025>
19131 18:35:08.196128 semctl(720899, 38, SETVAL, 0x1) = 0 <0.000026>