如何从网页获取数据并用Perl保存?

如何从网页获取数据并用Perl保存?,perl,Perl,我想写一个程序: 连接到博客服务并提供最近更新的博客的名称 将博客名称保存在文本文件中 如何连接到网页并从中读取数据?并保存该数据?您需要加载一个库以连接到另一台服务器,并打开一个文件以写入/打印到其中: use LWP::Simple qw /get/; my $content = get $url; open (MYFILE, '>>data.txt'); print MYFILE $content; close (MYFILE); Perl手册的格式化电子书位于。您可以使用

我想写一个程序:

  • 连接到博客服务并提供最近更新的博客的名称
  • 将博客名称保存在文本文件中

  • 如何连接到网页并从中读取数据?并保存该数据?

    您需要加载一个库以连接到另一台服务器,并打开一个文件以写入/打印到其中:

    use LWP::Simple qw /get/;
    my $content = get $url;
    
    open (MYFILE, '>>data.txt');
    print MYFILE $content;
    close (MYFILE);
    
    Perl手册的格式化电子书位于。

    您可以使用它访问网页内容,甚至登录和浏览多个网页:

    use WWW::Mechanize;
        my $mech = WWW::Mechanize->new();
    
        $mech->get( $url );
    
        $mech->follow_link( n => 3 );
        $mech->follow_link( text_regex => qr/download this/i );
        $mech->follow_link( url => 'http://host.com/index.html' );
    
        $mech->submit_form(
            form_number => 3,
            fields      => {
                username    => 'mungo',
                password    => 'lost-and-alone',
            }
        );
    
        $mech->submit_form(
            form_name => 'search',
            fields    => { query  => 'pot of gold', },
            button    => 'Search Now'
        );
    
        # get all textarea controls whose names begin with "customer"
        my @customer_text_inputs = $mech->find_all_inputs(
            type       => 'textarea',
            name_regex => qr/^customer/,
        );
    
        # get all text or textarea controls called "customer"
        my @customer_text_inputs = $mech->find_all_inputs(
            type_regex => qr/^(text|textarea)$/,
            name       => 'customer',
        );
    

    Perl有不同的web套件,用于稍有不同的任务。可以考虑使用<代码> LWP::用户代理< /代码> +>代码> HTML::树< /代码>,<代码> Web::查询< /代码>和<代码> MOJO < /代码>。我更喜欢

    一旦我们有了页面,我们就可以使用CSS选择器来提取我们感兴趣的数据。在这里,我来看看新的问题:

    use strict;    # safety net
    use warnings;  # safety net
    use feature 'say'; # a better "print"
    
    use Mojo;
    
    # fetch the stackoverflow perl page
    
    my $ua = Mojo::UserAgent->new;
    my $perl_page = $ua->get('http://stackoverflow.com/questions/tagged/perl')->res->dom;
    
    # extract all questions:
    
    my $questions = $perl_page->at('#questions');
    for my $question ($questions->find('h3 > a')->each) {
      say $question->all_text;
      say "  <", $question->attr('href'), ">";
    }
    
    使用strict;#安全网
    使用警告;#安全网
    使用功能“说”;#更好的“印刷品”
    使用魔咒;
    #获取stackoverflow perl页面
    my$ua=Mojo::UserAgent->new;
    我的$perl\u页面=$ua->get('http://stackoverflow.com/questions/tagged/perl')->res->dom;
    #摘录所有问题:
    我的$questions=$perl_页面->at('#questions');
    对于我的$question($questions->find('h3>a')->每个){
    说出$question->all_text;
    说“”;
    }
    
    输出:

    Perl script, parse text file between words
      </questions/20432447/perl-script-parse-text-file-between-words>
    Having issues with Spreadsheet::WriteExcel that makes me run the script twice to get desired file
      </questions/20432157/having-issues-with-spreadsheetwriteexcel-that-makes-me-run-the-script-twice-to>
    Calculate distance between a single atom and other atoms in a pdb file; print issue
      </questions/20431884/calculate-distance-between-a-single-atom-and-other-atoms-in-a-pdb-file-print-is>
    Exit status of child spawned in a pipe
      </questions/20431810/exit-status-of-child-spawned-in-a-pipe>
    How get data from a web page and save it with perl?
      </questions/20431443/how-get-data-from-a-web-page-and-save-it-with-perl>
    GatoIcon.py automatically generated <?> from images via perl?
      </questions/20431389/gatoicon-py-automatically-generated-from-images-via-perl>
    How and when can I use PPMs that weren't built in in ActivePerl 5.18?
      </questions/20430599/how-and-when-can-i-use-ppms-that-werent-built-in-in-activeperl-5-18>
    Translating perl to python - What does this line do (class variable confusion)
      </questions/20429516/translating-perl-to-python-what-does-this-line-do-class-variable-confusion>
    Fix files “corrupted” by Perl
      </questions/20427916/fix-files-corrupted-by-perl>
    how to add slash separator in perl
      </questions/20427499/how-to-add-slash-separator-in-perl>
    negative look ahead on whole number but preceded by a character(perl)
      </questions/20426507/negative-look-ahead-on-whole-number-but-preceded-by-a-characterperl>
    Use variable expansion in heredoc while piping data to gnuplot
      </questions/20426379/use-variable-expansion-in-heredoc-while-piping-data-to-gnuplot>
    How do I create multiple database connections in Catalyst with DBIC
      </questions/20425107/how-do-i-create-multiple-database-connections-in-catalyst-with-dbic>
    Moose's attribute vs simple sub?
      </questions/20424929/mooses-attribute-vs-simple-sub>
    How to use unicode in perl CGI param
      </questions/20424488/how-to-use-unicode-in-perl-cgi-param>
    
    Perl脚本,解析单词之间的文本文件
    电子表格::WriteExcel出现问题,使我运行脚本两次以获取所需文件
    计算pdb文件中单个原子和其他原子之间的距离;印刷发行
    管道中生成的子级的退出状态
    如何从网页获取数据并用perl保存?
    通过perl从图像自动生成GatoIcon.py?
    如何以及何时使用ActivePerl 5.18中未内置的PPM?
    将perl转换为python—这一行做什么(类变量混淆)
    修复被Perl“损坏”的文件
    如何在perl中添加斜杠分隔符
    整数的负向前看,但前面有一个字符(perl)
    将数据传输到gnuplot时,在heredoc中使用变量扩展
    如何使用DBIC在Catalyst中创建多个数据库连接
    驼鹿的属性与简单潜艇?
    如何在perl CGI参数中使用unicode
    
    谢谢您的回答,但我不想保存所有网页。我只想保存博客名。我应该怎么做?通常使用正则表达式从$content解析名称$content=~m/([a-zA-Z\/][^>]+)/si;或者解析您想要/需要提取以处理的任何信息。1)始终
    使用strict;使用警告,2)使用,3)表示追加;使用
    ,在本例中,4)查看规范Cookie如何?登录时(使用$mech->submit_form())是否会自动注意到这一点?@PeterMortensen是,