Perl WWW::Mechanize不处理撇号或破折号
我一直在努力从Metacritic中提取信息,但现在遇到了一个问题,即无法提取带有撇号或破折号的干净文本 以下代码说明了此问题:Perl WWW::Mechanize不处理撇号或破折号,perl,unicode,utf-8,www-mechanize,Perl,Unicode,Utf 8,Www Mechanize,我一直在努力从Metacritic中提取信息,但现在遇到了一个问题,即无法提取带有撇号或破折号的干净文本 以下代码说明了此问题: use WWW::Mechanize; $reviewspage = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews'; $Review = 'In the end Death triumphs, but its allure and obsession remain a my
use WWW::Mechanize;
$reviewspage = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews';
$Review = 'In the end Death triumphs, but its allure and obsession remain a mystery.';
$l = WWW::Mechanize->new();
$l->get($reviewspage);
$k = $l->content;
@Review = $k =~ m{$Review.*?<div class="review_body">(.*?)</div>}s;
print "@Review\n";
尽管网站上的编码是:
<div class="review_body">
Too much of the doc takes our taste for granted; Alice Cooper, Henry Rollins and others won’t persuade you that Death could have been huge, nor does a clichéd last-act reunion show. But the film’s alternating inquiry — into family love, slow compromise and, yes, death — resonates strongly.
</div>
太多的博士认为我们的品味是理所当然的;艾丽丝·库珀、亨利·罗林斯和其他人不会让你相信死亡可能是巨大的,陈词滥调的最后一幕重逢节目也不会。但影片的交替探究——家庭之爱、缓慢的妥协和死亡——引起强烈共鸣。
我以前创建过类似的脚本,它们都使用了WWW::Mechanize,但没有一个替换掉这样的字符。Metacritic使用utf8字符集:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
输出(添加新行):
显然这是一个unicode问题 根据中的建议,我能够使您的代码的此版本正常工作:
use v5.12 ;
use utf8 ;
use open qw( :encoding(UTF-8) :std ) ;
use WWW::Mechanize ;
my $reviewspage = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews' ;
my $Review = 'In the end Death triumphs, but its allure and obsession remain a mystery.' ;
my $l = WWW::Mechanize->new() ;
$l->get($reviewspage) ;
my $k = $l->content ;
my @Review = $k =~ m{$Review.*?<div class="review_body">(.*?)</div>}s ;
print "@Review\n" ;
use strict;
use warnings;
use utf8;
use WWW::Mechanize;
binmode STDOUT, ':utf8'; # output should be in UTF-8
my $url = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews';
my $Review = 'In the end Death triumphs, but its allure and obsession remain a mystery.';
my $lwp = WWW::Mechanize->new();
$lwp->get($url);
my $data = $lwp->content;
if ($data =~ m{$Review.*?<div class="review_body">(.*?)</div>}s) {
print "$1\n";
} else {
warn "Review not found";
}
Too much of the doc takes our taste for granted; Alice Cooper, Henry Rollins
and others won’t persuade you that Death could have been huge, nor does a
clichéd last-act reunion show. But the film’s alternating inquiry — into
family love, slow compromise and, yes, death — resonates strongly.
use v5.12 ;
use utf8 ;
use open qw( :encoding(UTF-8) :std ) ;
use WWW::Mechanize ;
my $reviewspage = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews' ;
my $Review = 'In the end Death triumphs, but its allure and obsession remain a mystery.' ;
my $l = WWW::Mechanize->new() ;
$l->get($reviewspage) ;
my $k = $l->content ;
my @Review = $k =~ m{$Review.*?<div class="review_body">(.*?)</div>}s ;
print "@Review\n" ;
Too much of the doc takes our taste for granted; Alice Cooper, Henry Rollins and others won’t persuade you that Death could have been huge, nor does a clichéd last-act reunion show. But the film’s alternating inquiry — into family love, slow compromise and, yes, death — resonates strongly.