Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/86.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Web::Scraper提取javascript_Javascript_Html_Perl_Parsing - Fatal编程技术网

使用Web::Scraper提取javascript

使用Web::Scraper提取javascript,javascript,html,perl,parsing,Javascript,Html,Perl,Parsing,使用Web::Scraper提取javascript时遇到问题。下面是我的测试脚本: #!/usr/bin/perl use Modern::Perl; use Web::Scraper; use Data::Dumper; my $contents = do { local $/; <DATA> }; my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; }; my $res = $scrape

使用Web::Scraper提取javascript时遇到问题。下面是我的测试脚本:

#!/usr/bin/perl
use Modern::Perl;
use Web::Scraper;
use Data::Dumper;

my $contents = do { local $/; <DATA> };
my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $res = $scraper->scrape($contents);

say Dumper $res;

exit;

__DATA__
<html><head><title>hello</title></head>
<body>
  <script type="text/javascript">
    var dummy = {}
  </script>
</body>
</html>

在我看来,我正在查找脚本标记,但没有在标记之间保存内容。

我在深入研究xpath之后找到了解决方案

将刮板线从以下位置更改为:

my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
致:

输出javascript代码:

$VAR1 = {
          'scripts' => [
                         {
                           'script' => '
    var dummy = {}
  '
                         }
                       ]
        };
我不相信这条生产线是简洁的,但它是有效的。

试试生料

#!/usr/bin/perl --
use strict;
use warnings;
use Web::Scraper;
use Data::Dump;
my $contents = q{
<html><head><title>hello</title></head>
<body>
  <script type="text/javascript">
    var dummy = {}
  </script>
</body>
</html>};
#~ my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $scraper = scraper { process "//script", "scripts[]" => 'RAW'; };
my $res = $scraper->scrape($contents);
dd $res;
__END__
{ scripts => ["\n    var dummy = {}"] }

谢谢你的建议。我尝试了更改,它确实提取了js代码。唯一的问题是var中的数据带有引号,这使得解析数据更加困难。
$VAR1 = {
          'scripts' => [
                         {
                           'script' => '
    var dummy = {}
  '
                         }
                       ]
        };
#!/usr/bin/perl --
use strict;
use warnings;
use Web::Scraper;
use Data::Dump;
my $contents = q{
<html><head><title>hello</title></head>
<body>
  <script type="text/javascript">
    var dummy = {}
  </script>
</body>
</html>};
#~ my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $scraper = scraper { process "//script", "scripts[]" => 'RAW'; };
my $res = $scraper->scrape($contents);
dd $res;
__END__
{ scripts => ["\n    var dummy = {}"] }