如何使用多线程查看Perl中是否存在网页?
我正在编写一个Perl脚本,它接收URL列表并检查它们是否存在。(请注意,我只关心它们是否存在;我不关心它们的内容是什么。这是程序的重要部分如何使用多线程查看Perl中是否存在网页?,perl,networking,Perl,Networking,我正在编写一个Perl脚本,它接收URL列表并检查它们是否存在。(请注意,我只关心它们是否存在;我不关心它们的内容是什么。这是程序的重要部分 use LWP::Simple qw($ua head); if (head($url)) { $numberAlive ++; } else { $numberDead ++; } 现在这个程序运行得很好;但是,我希望它运行得更快。因此,我正在考虑将其设置为多线程。我假设程序的慢部分是针对每个URL与服务器联系;因此,我正在寻找一种方
use LWP::Simple qw($ua head);
if (head($url))
{
$numberAlive ++;
}
else
{
$numberDead ++;
}
现在这个程序运行得很好;但是,我希望它运行得更快。因此,我正在考虑将其设置为多线程。我假设程序的慢部分是针对每个URL与服务器联系;因此,我正在寻找一种方法,在等待第一个响应时,我可以向列表中其他网页的URL发送请求。我该怎么做?据我所知,
head
例程没有一个回调函数,一旦服务器做出响应,就可以调用它。从熟悉的前端问题开始
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # for // (defined-or)
use IO::Handle;
use IO::Select;
use LWP::Simple;
use POSIX qw/ :sys_wait_h /;
use Socket;
全局常数控制程序执行
my $DEBUG = 0;
my $EXIT_COMMAND = "<EXIT>";
my $NJOBS = 10;
要创建工作进程,我们首先创建一个。父进程将使用一端,每个工作进程(子进程)将使用另一端。我们在两端禁用缓冲,并将父进程添加到IO::Select实例。我们还记录每个子进程的进程ID,以便等待所有工作进程完成
sub create_worker {
my($sel,$kidpid) = @_;
socketpair my $parent, my $kid, AF_UNIX, SOCK_STREAM, PF_UNSPEC
or die "$0: socketpair: $!";
$_->autoflush(1) for $parent, $kid;
my $pid = fork // die "$0: fork: $!";
if ($pid) {
++$kidpid->{$pid};
close $kid or die "$0: close: $!";
$sel->add($parent);
}
else {
close $parent or die "$0: close: $!";
check_sites $kid;
die "NOTREACHED";
}
}
要分派URL,父级将获取尽可能多的可用读卡器,并从作业队列中分发相同数量的URL。作业队列为空后剩余的任何工作线程都将接收exit命令
请注意,如果基础工作进程已退出,print
将失败。父进程必须忽略SIGPIPE
,以防止立即终止
sub dispatch_jobs {
my($sel,$jobs) = @_;
foreach my $s ($sel->can_write) {
my $url = @$jobs ? shift @$jobs : $EXIT_COMMAND;
warn "$0 [$$]: sending '$url' to fd ", fileno $s if $DEBUG;
print $s $url, "\n" or $sel->remove($s);
}
}
当控件到达读取\u结果
时,工作进程已创建并收到工作。现在,父进程用于等待一个或多个工作进程的结果。已定义的结果是当前工作进程的应答,未定义的结果表示子进程已退出并关闭套接字的另一端
sub read_results {
my($sel,$results) = @_;
warn "$0 [$$]: waiting for readers" if $DEBUG;
foreach my $s ($sel->can_read) {
warn "$0: [$$]: reading from fd ", fileno $s if $DEBUG;
if (defined(my $result = <$s>)) {
chomp $result;
push @$results, $result;
warn "$0 [$$]: got '$result' from fd ", fileno $s if $DEBUG;
}
else {
warn "$0 [$$]: eof from fd ", fileno $s if $DEBUG;
$sel->remove($s);
}
}
}
运行池将执行上面的sub以分派所有URL并返回所有结果
sub reap_workers {
my($kidpid) = @_;
while ((my $pid = waitpid -1, WNOHANG) > 0) {
warn "$0: [$$]: reaped $pid" if $DEBUG;
delete $kidpid->{$pid};
}
}
sub run_pool {
my($n,@jobs) = @_;
my $sel = IO::Select->new;
my %kidpid;
my @results;
create_worker $sel, \%kidpid for 1 .. $n;
local $SIG{PIPE} = "IGNORE"; # writes to dead workers will fail
while (@jobs || keys %kidpid || $sel->handles) {
dispatch_jobs $sel, \@jobs;
read_results $sel, \@results;
reap_workers \%kidpid;
}
warn "$0 [$$]: returning @results" if $DEBUG;
@results;
}
使用示例主程序
my @jobs = qw(
bogus
http://stackoverflow.com/
http://www.google.com/
http://www.yahoo.com/
);
my @results = run_pool $NJOBS, @jobs;
print $_, "\n" for @results;
输出是
bogus: NO
http://www.google.com/: YES
http://stackoverflow.com/: YES
http://www.yahoo.com/: YES
假的:没有
http://www.google.com/: 对
http://stackoverflow.com/: 对
http://www.yahoo.com/: 是基于工作者的并行化(使用您选择的线程或进程):
另一个选项是HTTP::Async
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Request;
use HTTP::Async;
my $numberAlive = 0;
my $numberDead = 0;
my @urls = ('http://www.perl.com','http://www.example.xyzzy/foo.html');
my $async = HTTP::Async->new;
# you might want to wrap this in a loop to deal with @urls in batches
foreach my $url (@urls){
$async->add( HTTP::Request->new( HEAD => $url ) );
}
while ( my $response = $async->wait_for_next_response ) {
if ($response->code == 200){$numberAlive ++;}
else{$numberDead ++;}
}
print "$numberAlive Alive, $numberDead Dead\n";
我强烈建议您使用而不是自己处理流程池。复制自己的帖子
use strict;
use warnings;
use feature qw( say );
use threads; # or: use forks;
use LWP::Simple qw( head );
use Thread::Queue::Any qw( );
use constant NUM_WORKERS => 10; # Or whatever.
my $req_q = Thread::Queue::Any->new();
my $resp_q = Thread::Queue::Any->new();
my @workers;
for (1..NUM_WORKERS) {
push @workers, async {
while (my $url = $req_q->dequeue()) {
my $is_alive = head($url) ? 1 : 0;
$resp_q->enqueue($is_alive);
}
};
}
$req_q->enqueue($_) for @urls;
my ($alive, $dead);
for (1..@urls) {
my $is_alive = $resp_q->dequeue();
++( $is_alive ? $alive : $dead );
}
$req_q->enqueue(undef) for @workers;
$_->join for @workers;
say $alive;
say $dead;
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Request;
use HTTP::Async;
my $numberAlive = 0;
my $numberDead = 0;
my @urls = ('http://www.perl.com','http://www.example.xyzzy/foo.html');
my $async = HTTP::Async->new;
# you might want to wrap this in a loop to deal with @urls in batches
foreach my $url (@urls){
$async->add( HTTP::Request->new( HEAD => $url ) );
}
while ( my $response = $async->wait_for_next_response ) {
if ($response->code == 200){$numberAlive ++;}
else{$numberDead ++;}
}
print "$numberAlive Alive, $numberDead Dead\n";