我可以将网页的源代码从curl传输到perl吗？_Perl_Curl

我可以将网页的源代码从curl传输到perl吗？

perl curl

我可以将网页的源代码从curl传输到perl吗？,perl,curl,Perl,Curl,我正在解析许多网站的源代码，一个包含数千页的巨大网站。现在我想在perĺ中搜索内容，我想查找关键字的出现次数为了解析网页，我使用curl并将输出通过管道传输到grep-c，这不起作用，所以我想使用perl。perl可以完全用于抓取页面吗例如说明：在上面的文本文件中，我有可用和不可用的URL。通过解析Grep，我获取可用的url。使用awk，我选择第二列，其中包含纯可用URL。到现在为止，一直都还不错。现在来看这个问题：使用Curl，我获取附加了一些参数的源代码，并将每个页面的整个源代码传递

我正在解析许多网站的源代码，一个包含数千页的巨大网站。现在我想在perĺ中搜索内容，我想查找关键字的出现次数

为了解析网页，我使用curl并将输出通过管道传输到grep-c，这不起作用，所以我想使用perl。perl可以完全用于抓取页面吗

例如

说明：在上面的文本文件中，我有可用和不可用的URL。通过解析Grep，我获取可用的url。使用awk，我选择第二列，其中包含纯可用URL。到现在为止，一直都还不错。现在来看这个问题：使用Curl，我获取附加了一些参数的源代码，并将每个页面的整个源代码传递给perl，以便计算myKeywordToSearchFor的出现次数。只有在可能的情况下，我才愿意用perl实现这一点

谢谢

试试更像

   perl -e 'while(<>){my @words = split ' ';for my $word(@words){if(/myKeyword/){++$c}}} print "$c\n"'

这只使用未经测试的Perl：

use strict;
use warnings;

use File::Fetch;

my $count;
open my $SPIDER, '<', 'RawJSpiderOutput.txt' or die $!;
while (<$SPIDER>) {
    chomp;
    if (/parsed/) {
        my $url = (split)[1];
        $url .= '?myPara=en';
        my $ff = File::Fetch->new(uri => $url);
        $ff->fetch or die $ff->error;
        my $fetched = $ff->output_file;
        open my $FETCHED, '<', $fetched or die $!;
        while (<$FETCHED>) {
            $count++ if /myKeyword/;
        }
        unlink $fetched;
    }
}
print "$count\n";

或者，同样的代码使用perl的命令行选项：perl-lane'$c+=grep/myKeyword/，@F；END{print$c}'这几乎和原来的shell脚本一样短：-我会提出一些建议。LWP对于获取来说更为标准，不会导致必须取消链接的临时文件。如果您希望数据量达到10MB以上，这可能是File:：Fetch的优势所在。b使用Coro/AnyEvent:：HTTP和其他选项，整个过程可能会加快——这些是我现在最喜欢的选项。将URL放入队列，创建一组工作线程，从队列中获取一个项目并获取它，扫描关键字。更好地利用您的带宽。@Tanktalus:我使用File:：Fetch的唯一原因是它是一个核心模块。@choroba您的意思可能是取消$fetched的链接。您也很可能希望在“”上拆分，即拆分[1]，这将去除多余的空白，而不是将它们转换为数组中的空元素。LWP:：Simple是核心的一部分，因为Perl v5.12.3Zaid:LWP:：Simple在任何版本中都不是Perl核心的一部分。尝试运行corelist LWP:：Simple。

   while (<>)               # as long as we're getting input (into “$_”)
   { my @words = split ' '; # split $_ (implicit) into whitespace, so we examine each word
     for my $word (@words)  #  (and don't miss two keywords on one line)
     { if (/myKeyword/)     # whenever it's found,
       { ++$c } } }         # increment the counter (auto-vivified)
   print "$c\n"             # and after end of file is reached, print the counter

   use strict;
   my $count = 0;
   while (my $line = <STDIN>) # except that <> is actually more magical than this
   { my @words = split ' ' => $line;
     for my $word (@words)
     { ++$count; } } }
   print "$count\n";

use strict;
use warnings;

use File::Fetch;

my $count;
open my $SPIDER, '<', 'RawJSpiderOutput.txt' or die $!;
while (<$SPIDER>) {
    chomp;
    if (/parsed/) {
        my $url = (split)[1];
        $url .= '?myPara=en';
        my $ff = File::Fetch->new(uri => $url);
        $ff->fetch or die $ff->error;
        my $fetched = $ff->output_file;
        open my $FETCHED, '<', $fetched or die $!;
        while (<$FETCHED>) {
            $count++ if /myKeyword/;
        }
        unlink $fetched;
    }
}
print "$count\n";