Perl WWW:：Mechanize比较不同URL的响应头内容长度_Perl_Hash_Www Mechanize

Perl WWW:：Mechanize比较不同URL的响应头内容长度

perl hash

Perl WWW:：Mechanize比较不同URL的响应头内容长度,perl,hash,www-mechanize,Perl,Hash,Www Mechanize,我有个问题希望你能帮我我有两个文本文件，其中包含以下内容： FILE1.txt FILE2.txt 我正确实现的输出如下所示：执行上述操作的代码我遇到的问题是如何将每个新url与正确的原始url进行比较？i、 e不要意外地将的内容长度与的内容长度进行比较。我应该使用散列吗？我该怎么做非常感谢您的帮助，非常感谢您应该为此使用哈希。我会更改您的输入代码，使其成为更复杂的数据结构，因为这会使任务更容易 open my $animalUrls, '<', 'FILE1.txt' or di

我有个问题希望你能帮我

我有两个文本文件，其中包含以下内容：

FILE1.txt

FILE2.txt

我正确实现的输出如下所示：

执行上述操作的代码

我遇到的问题是如何将每个新url与正确的原始url进行比较？i、 e不要意外地将的内容长度与的内容长度进行比较。我应该使用散列吗？我该怎么做

非常感谢您的帮助，非常感谢

您应该为此使用哈希。我会更改您的输入代码，使其成为更复杂的数据结构，因为这会使任务更容易

open my $animalUrls, '<', 'FILE1.txt' or die "Can't open: $!";
open my $directory, '<', 'FILE2.txt' or die "Can't open: $!";

my @directory = <$directory>;   #each line of the file into an array
close $directory or die "Can't close: $!";
my $newURLs;

while ( my $baseURL = <$animalUrls> ) {
  chomp $baseURL;

  SUBDIR: foreach my $subdir (@directory) {
    chomp $subdir;
    next SUBDIR if $subdir eq "";
    # put each new url into arrayref
    push( @{ $newURLs->{$baseURL} }, $baseURL . $subdir );
  }
}

您甚至可以不使用第二组foreach循环，将代码放在构建数据结构的地方

如果您不熟悉这些参考资料，请查看。我们在这里所做的是为每个基本URL创建一个带有一个键的哈希，并将所有生成的子URL的数组放入其中。如果使用Data:：Dumper输出最终的$newURL，它将如下所示：

$VAR1 = {
  'http://www.dog.com/' => [
    'http://www.dog.com/1',
    'http://www.dog.com/2',
   ],
  'http://www.cat.com/' => [
    'http://www.cat.com/1',
    'http://www.cat.com/2',
   ],
};

编辑：我更新了代码。我使用这些文件来测试它：

网址：

您应该对此使用哈希。我会更改您的输入代码，使其成为更复杂的数据结构，因为这会使任务更容易

open my $animalUrls, '<', 'FILE1.txt' or die "Can't open: $!";
open my $directory, '<', 'FILE2.txt' or die "Can't open: $!";

my @directory = <$directory>;   #each line of the file into an array
close $directory or die "Can't close: $!";
my $newURLs;

while ( my $baseURL = <$animalUrls> ) {
  chomp $baseURL;

  SUBDIR: foreach my $subdir (@directory) {
    chomp $subdir;
    next SUBDIR if $subdir eq "";
    # put each new url into arrayref
    push( @{ $newURLs->{$baseURL} }, $baseURL . $subdir );
  }
}

您甚至可以不使用第二组foreach循环，将代码放在构建数据结构的地方

$VAR1 = {
  'http://www.dog.com/' => [
    'http://www.dog.com/1',
    'http://www.dog.com/2',
   ],
  'http://www.cat.com/' => [
    'http://www.cat.com/1',
    'http://www.cat.com/2',
   ],
};

编辑：我更新了代码。我使用这些文件来测试它：

网址：

这段代码似乎可以满足您的需要。它将所有URL存储在@URL中，并在获取每个URL时打印内容长度。我不知道以后需要什么样的长度数据，但我已将每个响应的长度存储在哈希%length中，以便将它们与URL关联

use 5.010;
use warnings;

use LWP::UserAgent;

STDOUT->autoflush;

my @urls;

open my $fh, '<', 'FILE1.txt' or die $!;
while (my $base = <$fh>) {
  chomp $base;
  push @urls, $base;
  open my $fh, '<', 'FILE2.txt' or die $!;
  while (my $path = <$fh>) {
    chomp $path;
    push @urls, $base.$path;
  }
}

my $ua = LWP::UserAgent->new;

my %lengths;

for my $url (@urls) {
  my $resp = $ua->get($url);
  my $length = $resp->header('Content-Length');
  $lengths{$url} = $length;

  printf "%s  --  %s\n", $url, $length // 'undef';
}

use 5.010;
use warnings;

use LWP::UserAgent;

STDOUT->autoflush;

my @urls;

open my $fh, '<', 'FILE1.txt' or die $!;
while (my $base = <$fh>) {
  chomp $base;
  push @urls, $base;
  open my $fh, '<', 'FILE2.txt' or die $!;
  while (my $path = <$fh>) {
    chomp $path;
    push @urls, $base.$path;
  }
}

my $ua = LWP::UserAgent->new;

my %lengths;

for my $url (@urls) {
  my $resp = $ua->get($url);
  my $length = $resp->header('Content-Length');
  $lengths{$url} = $length;

  printf "%s  --  %s\n", $url, $length // 'undef';
}

我很高兴你接受了我们关于公开赛的建议。干得好@simbabque-是的，谢谢，忽略它是愚蠢的：，对这一点有什么想法吗？顺便说一下，我认为WWW：：Mechanize可能有点太大了。如果您只想获得内容长度，我将只使用LWP:：UserAgent。但是，由于Mechanize继承自LWP:：UserAgent，所以它实际上没有多大区别。如果你对Mechanize更满意，那就坚持下去。@simbabque-是的，我理解你的意思，唯一的逻辑是这部分代码将作为一个更大的程序的一部分使用，我将在其中使用许多其他Mechanize代码，非常感谢：我很高兴你接受了我们关于开放的建议。干得好@simbabque-是的，谢谢，忽略它是愚蠢的：，对这一点有什么想法吗？顺便说一下，我认为WWW：：Mechanize可能有点太大了。如果您只想获得内容长度，我将只使用LWP:：UserAgent。但是，由于Mechanize继承自LWP:：UserAgent，所以它实际上没有多大区别。如果你对Mechanize更满意，那就坚持下去。@simbabque-是的，我理解你的意思，唯一的逻辑是这部分代码将作为一个更大的程序的一部分使用，我将在其中使用许多其他Mechanize代码，非常感谢：这正是我一直在寻找的东西：，我知道你说它未经测试，但我似乎无法让它正常工作，也无法找出原因，如果你有机会测试它，我将不胜感激，谢谢。我认为这可能与“$mech->get$url；”有关部分原因是im获得“400 URL必须是绝对值”和“使用未初始化值$content\u数值形式的长度”=我在'@perl user'更新了代码。你是对的，里面有个错误。我对散列使用了错误的键，还添加了空键。只要看看Dumper$newurl，你就可以知道这一点-@perl用户SUBDIR:是一个标签。您可以在外循环的下一个或最后一个内循环中查看@perl user。我会给两个循环都添加标签，比如我在第一个代码块中使用的SUBDIR:one。这正是我想要的：，我知道你说它未经测试，但我似乎无法让它正常工作，也无法找出原因，如果你有机会测试它，我将不胜感激，谢谢，我想这可能与“$mech->get$url；”有关部分原因是我得到了“400 URL必须是绝对的”和“我们”

未初始化值$content_长度的e（以数字形式表示）=我在'@perl user'更新了代码。你是对的，里面有个错误。我对散列使用了错误的键，还添加了空键。只要看看Dumper$newurl，你就可以知道这一点-@perl用户SUBDIR:是一个标签。您可以在外循环的下一个或最后一个内循环中查看@perl user。我会将标签添加到这两个循环中，就像我在第一个代码块中使用的SUBDIR一样。

foreach my $url ( keys %{$newURLs} ) {
  # first get the base URL and save its content length
  $mech->get($url);
  my $content_length = $mech->response->header('Content-Length');

  # now iterate all the 'child' URLs
  foreach my $child_url ( @{ $newURLs->{$url} } ) {
    # get the content
    $mech->get($child_url);

    # compare
    if ( $mech->response->header('Content-Length') != $content_length ) {
      print "$child_url: different content length: $content_length vs "
        . $mech->response->header('Content-Length') . "!\n";
    }
  }
}

$VAR1 = {
  'http://www.dog.com/' => [
    'http://www.dog.com/1',
    'http://www.dog.com/2',
   ],
  'http://www.cat.com/' => [
    'http://www.cat.com/1',
    'http://www.cat.com/2',
   ],
};

http://www.stackoverflow.com/ 
http://www.superuser.com/

faq
questions
/

use 5.010;
use warnings;

use LWP::UserAgent;

STDOUT->autoflush;

my @urls;

open my $fh, '<', 'FILE1.txt' or die $!;
while (my $base = <$fh>) {
  chomp $base;
  push @urls, $base;
  open my $fh, '<', 'FILE2.txt' or die $!;
  while (my $path = <$fh>) {
    chomp $path;
    push @urls, $base.$path;
  }
}

my $ua = LWP::UserAgent->new;

my %lengths;

for my $url (@urls) {
  my $resp = $ua->get($url);
  my $length = $resp->header('Content-Length');
  $lengths{$url} = $length;

  printf "%s  --  %s\n", $url, $length // 'undef';
}

http://www.dog.com/  --  undef
http://www.dog.com/1  --  56244
http://www.dog.com/2  --  56244
http://www.dog.com/Barry  --  56249
http://www.cat.com/  --  156
http://www.cat.com/1  --  11088
http://www.cat.com/2  --  11088
http://www.cat.com/Barry  --  11088
http://www.antelope.com/  --  undef
http://www.antelope.com/1  --  undef
http://www.antelope.com/2  --  undef
http://www.antelope.com/Barry  --  undef