使用Web::Scraper提取javascript
使用Web::Scraper提取javascript时遇到问题。下面是我的测试脚本:使用Web::Scraper提取javascript,javascript,html,perl,parsing,Javascript,Html,Perl,Parsing,使用Web::Scraper提取javascript时遇到问题。下面是我的测试脚本: #!/usr/bin/perl use Modern::Perl; use Web::Scraper; use Data::Dumper; my $contents = do { local $/; <DATA> }; my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; }; my $res = $scrape
#!/usr/bin/perl
use Modern::Perl;
use Web::Scraper;
use Data::Dumper;
my $contents = do { local $/; <DATA> };
my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $res = $scraper->scrape($contents);
say Dumper $res;
exit;
__DATA__
<html><head><title>hello</title></head>
<body>
<script type="text/javascript">
var dummy = {}
</script>
</body>
</html>
在我看来,我正在查找脚本标记,但没有在标记之间保存内容。我在深入研究xpath之后找到了解决方案 将刮板线从以下位置更改为:
my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
致:
输出javascript代码:
$VAR1 = {
'scripts' => [
{
'script' => '
var dummy = {}
'
}
]
};
我不相信这条生产线是简洁的,但它是有效的。试试生料
#!/usr/bin/perl --
use strict;
use warnings;
use Web::Scraper;
use Data::Dump;
my $contents = q{
<html><head><title>hello</title></head>
<body>
<script type="text/javascript">
var dummy = {}
</script>
</body>
</html>};
#~ my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $scraper = scraper { process "//script", "scripts[]" => 'RAW'; };
my $res = $scraper->scrape($contents);
dd $res;
__END__
{ scripts => ["\n var dummy = {}"] }
谢谢你的建议。我尝试了更改,它确实提取了js代码。唯一的问题是var中的数据带有引号,这使得解析数据更加困难。
$VAR1 = {
'scripts' => [
{
'script' => '
var dummy = {}
'
}
]
};
#!/usr/bin/perl --
use strict;
use warnings;
use Web::Scraper;
use Data::Dump;
my $contents = q{
<html><head><title>hello</title></head>
<body>
<script type="text/javascript">
var dummy = {}
</script>
</body>
</html>};
#~ my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $scraper = scraper { process "//script", "scripts[]" => 'RAW'; };
my $res = $scraper->scrape($contents);
dd $res;
__END__
{ scripts => ["\n var dummy = {}"] }