Delphi 默认线程池的TParallel.的奇怪行为
我正在试用DelphiXe7更新1的并行编程功能 我创建了一个简单的Delphi 默认线程池的TParallel.的奇怪行为,delphi,parallel-processing,rtl-ppl,Delphi,Parallel Processing,Rtl Ppl,我正在试用DelphiXe7更新1的并行编程功能 我创建了一个简单的TParallel.For循环,它基本上执行一些伪操作来打发时间 我在AWS实例(c4.8xlarge)上的36个vCPU上启动了该程序,以尝试了解并行编程的好处 当我第一次启动程序并执行TParallel.For循环时,我看到了一个显著的增益(尽管比我预期的36个VCPU要小得多): 如果我不关闭程序并在不久后(例如,立即或大约10-20秒后)在36 vCPU机器上再次运行pass,并行pass会恶化很多: Parallel
TParallel.For
循环,它基本上执行一些伪操作来打发时间
我在AWS实例(c4.8xlarge)上的36个vCPU上启动了该程序,以尝试了解并行编程的好处
当我第一次启动程序并执行TParallel.For
循环时,我看到了一个显著的增益(尽管比我预期的36个VCPU要小得多):
如果我不关闭程序并在不久后(例如,立即或大约10-20秒后)在36 vCPU机器上再次运行pass,并行pass会恶化很多:
Parallel matches: 23077169 in 2322ms
Single Threaded matches: 23077169 in 2316ms
如果我不关闭程序,在再次运行pass之前等待几分钟(不是几秒钟,而是几分钟),我会再次得到第一次启动程序时得到的结果(响应时间提高了10倍)
在36 vCPUs机器上,启动程序后的第一次通过总是更快,因此这种效果似乎只在程序中调用TParallel.For
时才会发生
这是我正在运行的示例代码:
unit ParallelTests;
interface
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
System.Threading, System.SyncObjs, System.Diagnostics,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;
type
TForm1 = class(TForm)
Button1: TButton;
Memo1: TMemo;
SingleThreadCheckBox: TCheckBox;
ParallelCheckBox: TCheckBox;
UnitsEdit: TEdit;
Label1: TLabel;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
procedure TForm1.Button1Click(Sender: TObject);
var
matches: integer;
i,j: integer;
sw: TStopWatch;
maxItems: integer;
referenceStr: string;
begin
sw := TStopWatch.Create;
maxItems := 5000;
Randomize;
SetLength(referenceStr,120000); for i := 1 to 120000 do referenceStr[i] := Chr(Ord('a') + Random(26));
if ParallelCheckBox.Checked then begin
matches := 0;
sw.Reset;
sw.Start;
TParallel.For(1, MaxItems,
procedure (Value: Integer)
var
index: integer;
found: integer;
begin
found := 0;
for index := 1 to length(referenceStr) do begin
if (((Value mod 26) + ord('a')) = ord(referenceStr[index])) then begin
inc(found);
end;
end;
TInterlocked.Add(matches, found);
end);
sw.Stop;
Memo1.Lines.Add('Parallel matches: ' + IntToStr(matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
end;
if SingleThreadCheckBox.Checked then begin
matches := 0;
sw.Reset;
sw.Start;
for i := 1 to MaxItems do begin
for j := 1 to length(referenceStr) do begin
if (((i mod 26) + ord('a')) = ord(referenceStr[j])) then begin
inc(matches);
end;
end;
end;
sw.Stop;
Memo1.Lines.Add('Single Threaded matches: ' + IntToStr(Matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
end;
end;
end.
这能按设计工作吗?我发现这篇文章()建议我让库决定线程池,但是如果我必须在请求之间等待几分钟,以便更快地处理请求,我看不到使用并行编程的意义
关于如何使用TParallel.For
循环,我是否遗漏了什么
请注意,我无法在AWS m3.1大型实例(根据AWS为2个vCPU)上复制此内容。在这种情况下,我总是得到轻微的改进,并且在随后不久的TParallel.For
调用中没有得到更差的结果
Parallel matches: 23077054 in 2057ms
Single Threaded matches: 23077054 in 2900ms
因此,当有许多可用的内核时,似乎会出现这种效果(36),这是一个遗憾,因为并行编程的全部目的是从许多内核中获益。我想知道这是否是一个库错误,因为内核数很高,或者在这种情况下,内核数不是2的幂
更新:使用不同vCPU的各种实例进行测试后
在AWS中,这似乎是一种行为:
- 36 vCPU(c4.8XL)。您必须在后续呼叫到vanilla TParallel呼叫之间等待几分钟(这使它无法用于 生产)
- 32 vCPU(c3.8XL)。您必须在后续呼叫到vanilla TParallel呼叫之间等待几分钟(这使它无法用于 生产)
- 16 vCPU(c3.4XL)。你必须等待次秒。如果负载较低,但响应时间仍然很重要,那么它可能是可用的
- 8个vCPU(c3.2xlarge)。它似乎工作正常
- 4 vCPU(c3.xlarge)。它似乎工作正常
- 2 vCPU(m3.大)。它似乎工作正常
我在您的基础上创建了两个测试程序,用于比较
System.Threading
和。我使用XE7更新1和OTL r1397构建。我使用的OTL源代码对应于3.04版。我使用32位Windows编译器构建,使用版本构建选项
我的测试机是一台运行Windows 7 x64的双Intel Xeon E5530。该系统有两个四核处理器。这一共是8个处理器,但系统说有16个是由于超线程。经验告诉我,超线程只是市场上的废话,我从未见过在这台机器上扩展到8倍以上
现在来看两个几乎相同的程序
系统线程
program SystemThreadingTest;
{$APPTYPE CONSOLE}
uses
System.Diagnostics,
System.Threading;
const
maxItems = 5000;
DataSize = 100000;
procedure DoTest;
var
matches: integer;
i, j: integer;
sw: TStopWatch;
referenceStr: string;
begin
Randomize;
SetLength(referenceStr, DataSize);
for i := low(referenceStr) to high(referenceStr) do
referenceStr[i] := Chr(Ord('a') + Random(26));
// parallel
matches := 0;
sw := TStopWatch.StartNew;
TParallel.For(1, maxItems,
procedure(Value: integer)
var
index: integer;
found: integer;
begin
found := 0;
for index := low(referenceStr) to high(referenceStr) do
if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
inc(found);
AtomicIncrement(matches, found);
end);
Writeln('Parallel matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
// serial
matches := 0;
sw := TStopWatch.StartNew;
for i := 1 to maxItems do
for j := low(referenceStr) to high(referenceStr) do
if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
inc(matches);
Writeln('Serial matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
end;
begin
while True do
DoTest;
end.
OTL
program OTLTest;
{$APPTYPE CONSOLE}
uses
Winapi.Windows,
Winapi.Messages,
System.Diagnostics,
OtlParallel;
const
maxItems = 5000;
DataSize = 100000;
procedure ProcessThreadMessages;
var
msg: TMsg;
begin
while PeekMessage(Msg, 0, 0, 0, PM_REMOVE) and (Msg.Message <> WM_QUIT) do begin
TranslateMessage(Msg);
DispatchMessage(Msg);
end;
end;
procedure DoTest;
var
matches: integer;
i, j: integer;
sw: TStopWatch;
referenceStr: string;
begin
Randomize;
SetLength(referenceStr, DataSize);
for i := low(referenceStr) to high(referenceStr) do
referenceStr[i] := Chr(Ord('a') + Random(26));
// parallel
matches := 0;
sw := TStopWatch.StartNew;
Parallel.For(1, maxItems).Execute(
procedure(Value: integer)
var
index: integer;
found: integer;
begin
found := 0;
for index := low(referenceStr) to high(referenceStr) do
if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
inc(found);
AtomicIncrement(matches, found);
end);
Writeln('Parallel matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
ProcessThreadMessages;
// serial
matches := 0;
sw := TStopWatch.StartNew;
for i := 1 to maxItems do
for j := low(referenceStr) to high(referenceStr) do
if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
inc(matches);
Writeln('Serial matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
end;
begin
while True do
DoTest;
end.
程序OTLTest;
{$APPTYPE控制台}
使用
Winapi.Windows,
Winapi.Messages,
系统诊断,
奥特帕莱尔;
常数
maxItems=5000;
数据规模=100000;
过程消息;
变量
msg:TMsg;
开始
而peek消息(Msg,0,0,PM_REMOVE)和(Msg.Message WM_QUIT)确实开始
翻译信息;
发送消息(Msg);
结束;
结束;
程序测试;
变量
匹配项:整数;
i、 j:整数;
sw:TStopWatch;
referenceStr:string;
开始
随机化;
SetLength(referenceStr,DataSize);
对于i:=从低(referenceStr)到高(referenceStr)do
参考文献tr[i]:=Chr(Ord('a')+Random(26));
//平行的
匹配项:=0;
sw:=TStopWatch.StartNew;
并行。对于(1,maxItems)。执行(
过程(值:整数)
变量
索引:整数;
发现:整数;
开始
发现:=0;
对于索引:=低(referenceStr)到高(referenceStr)do
如果((值mod 26)+Ord('a'))=Ord(referenceStr[index]),则
公司(发现);
原子增量(匹配项,已找到);
(完),;
Writeln('Parallel matches:',matches,'in',sw.elapsedmillesons,'ms');
处理线程消息;
//连载
匹配项:=0;
sw:=TStopWatch.StartNew;
对于i:=1到maxItems do
对于j:=从低(referenceStr)到高(referenceStr)do
如果((i mod 26)+Ord('a'))=Ord(referenceStr[j]),则
公司(火柴),;
Writeln('Serial matches:',matches,'in',sw.elapsedmillesons,'ms');
结束;
开始
尽管如此
溺爱;
结束。
现在是输出
系统线程输出
Parallel matches: 19230817 in 374ms
Serial matches: 19230817 in 2423ms
Parallel matches: 19230698 in 374ms
Serial matches: 19230698 in 2409ms
Parallel matches: 19230556 in 368ms
Serial matches: 19230556 in 2433ms
Parallel matches: 19230635 in 2412ms
Serial matches: 19230635 in 2430ms
Parallel matches: 19230843 in 2441ms
Serial matches: 19230843 in 2413ms
Parallel matches: 19230905 in 2493ms
Serial matches: 19230905 in 2423ms
Parallel matches: 19231032 in 2430ms
Serial matches: 19231032 in 2443ms
Parallel matches: 19230669 in 2440ms
Serial matches: 19230669 in 2473ms
Parallel matches: 19230811 in 2404ms
Serial matches: 19230811 in 2432ms
....
Parallel matches: 19230667 in 422ms
Serial matches: 19230667 in 2475ms
Parallel matches: 19230663 in 335ms
Serial matches: 19230663 in 2438ms
Parallel matches: 19230889 in 395ms
Serial matches: 19230889 in 2461ms
Parallel matches: 19230874 in 391ms
Serial matches: 19230874 in 2441ms
Parallel matches: 19230617 in 385ms
Serial matches: 19230617 in 2524ms
Parallel matches: 19231021 in 368ms
Serial matches: 19231021 in 2455ms
Parallel matches: 19230904 in 357ms
Serial matches: 19230904 in 2537ms
Parallel matches: 19230568 in 373ms
Serial matches: 19230568 in 2456ms
Parallel matches: 19230758 in 333ms
Serial matches: 19230758 in 2710ms
Parallel matches: 19230580 in 371ms
Serial matches: 19230580 in 2532ms
Parallel matches: 19230534 in 336ms
Serial matches: 19230534 in 2436ms
Parallel matches: 19230879 in 368ms
Serial matches: 19230879 in 2419ms
Parallel matches: 19230651 in 409ms
Serial matches: 19230651 in 2598ms
Parallel matches: 19230461 in 357ms
....
平行比赛:374ms 19230817
序列匹配:19230817在2423毫秒
平行比赛:19230698分374秒
系列匹配:19230698英寸2409毫秒
平行比赛:19230556分368秒
系列比赛:2433毫秒19230556
平行比赛:2412毫秒19230635
系列比赛:2430毫秒19230635
平行匹配:2441毫秒19230843
系列比赛:2413毫秒19230843
平行匹配:2493毫秒19230905
序列匹配:2423ms内19230905
平行匹配:2430毫秒19231032
序列匹配:2443ms中的19231032
平行比赛:2440毫秒19230669
系列比赛:2473ms中的19230669
平行比赛:19230811分2404ms
系列比赛:2432ms中的19230811
....
Parallel matches: 19230667 in 422ms
Serial matches: 19230667 in 2475ms
Parallel matches: 19230663 in 335ms
Serial matches: 19230663 in 2438ms
Parallel matches: 19230889 in 395ms
Serial matches: 19230889 in 2461ms
Parallel matches: 19230874 in 391ms
Serial matches: 19230874 in 2441ms
Parallel matches: 19230617 in 385ms
Serial matches: 19230617 in 2524ms
Parallel matches: 19231021 in 368ms
Serial matches: 19231021 in 2455ms
Parallel matches: 19230904 in 357ms
Serial matches: 19230904 in 2537ms
Parallel matches: 19230568 in 373ms
Serial matches: 19230568 in 2456ms
Parallel matches: 19230758 in 333ms
Serial matches: 19230758 in 2710ms
Parallel matches: 19230580 in 371ms
Serial matches: 19230580 in 2532ms
Parallel matches: 19230534 in 336ms
Serial matches: 19230534 in 2436ms
Parallel matches: 19230879 in 368ms
Serial matches: 19230879 in 2419ms
Parallel matches: 19230651 in 409ms
Serial matches: 19230651 in 2598ms
Parallel matches: 19230461 in 357ms
....