Regex 屏幕刮板脚本赢得'；无法写入输出文件_Regex_Shell_Perl

Regex 屏幕刮板脚本赢得'；无法写入输出文件

regex shell perl

Regex 屏幕刮板脚本赢得'；无法写入输出文件,regex,shell,perl,Regex,Shell,Perl,我无法让下面的Perl脚本写入文件output.html 我不需要成为一个CGI脚本，但这是最终的意图有人能告诉我为什么它没有向output.html写入任何文本吗 #!/usr/bin/perl #----------------------------------------------------------------------- # This script should work as a CGI script, if I get it correctly. # Most CGI

我无法让下面的Perl脚本写入文件

output.html

我不需要成为一个CGI脚本，但这是最终的意图

有人能告诉我为什么它没有向output.html写入任何文本吗

#!/usr/bin/perl

#-----------------------------------------------------------------------
# This script should work as a CGI script, if I get it correctly.
# Most CGI scripts for Perl begin with the same line and must be
# stored in your servers cgi-bin directory. (I think this is set by
# your web server.
#
# This scripts will scrape news sites for stories about topics input
# by the users.
#
# Lara Landis
# Sinister Porpoise Computing
# 1/4/2018
# Personal Perl Project
#-----------------------------------------------------------------------

@global_sites = ();

print( "Starting program.\n" );

if ( !( -e "sitedata.txt" ) ) {
    enter_site_info( @global_sites );
}

if ( !( -e "scrpdata.txt" ) ) {

    print( "scrpdata.txt does not exist. Creating file now.\n" );
    print( "Enter the search words you wish to search for below. Press Ctrl-D to finish.\n" );

    open( SCRAPEFILE, ">scrpdata.txt" );
    while ( $line = <STDIN> ) {
        chop( $line );
        print SCRAPEFILE ( "$line\n" );
    }
    close( SCRAPEFILE );
}

print( "Finished getting site data..." );
scrape_sites( @global_sites );

#----------------------------------------------------------------------
# This routine gets information from the user after the file has been
# created. It also has some basic checking to make sure that the lines
# fed to it are legimate domains.  This is not an exhaustive list of
# all domains in existence.
#----------------------------------------------------------------------
sub enter_site_info {
    my ( @sisites ) = @_;

    $x = 1;

    open( DATAFILE, ">sitedata.txt" ) || die( "Could not open datafile.\n" );
    print( "Enter websites below. Press Crtl-D to finish.\n" );

    while ( $x <= @sisites ) {

        $sisites[$x] = <STDIN>;

        print( "$sisites[$x] added.\n" );
        print DATAFILE ( "$sisites[$x]\n" );

        $x++;
    }

    close( DATAFILE );

    return @sisites;
}

#----------------------------------------------------------------------
# If the file exists, just get the information from it.  Read info in
# from the sites. Remember to create a global array for the sites
# data.
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------
# Get the text to find in the sites that are being scraped. This requires
# nested loops. It starts by going through the loops for the text to be
# scraped, and then it goes through each of the websites listend in the
# sitedata.txt file.
#-----------------------------------------------------------------------
sub scrape_sites {
    my ( @ss_info ) = @_;

    @gsi_info = ();
    @toscrape = ();
    $y        = 1;

    #---------------------------
    # Working code to be altered
    #---------------------------
    print( "Getting site info..." );

    $x = 1;

    open( DATAFILE, "sitedata.txt" ) || die( "Can't open sitedata.txt.txt\n" );
    while ( $gsi_info[$x] = <DATAFILE> ) {

        chop( $gsi_info[$x] );
        print( "$gsi_info[$x]\n" );

        $x++;
    }

    close( DATAFILE );

    open( SCRAPEFILE, "scrpdata.txt" ) || die( "Can't open scrpdata.txt\n" );
    print( "Getting scrape data.\n" );

    $y = 1;

    while ( $toscrape[$y] = <SCRAPEFILE> ) {
        chop( $toscrape[$y] );
        $y++;
    }

    close( SCRAPEFILE );

    print( "Now opening the output file.\n" );

    $z = 1;

    open( OUTPUT, ">output.html" );
    print( "Now scraping sites.\n" );

    while ( $z <= @gsi_info ) {    #This loop contains SITES

        system( "rm -f index.html.*" );
        system( "wget $gsi_info[$z]" );

        $z1 = 1;

        print( "Searching site $gsi_info[$z] for $toscrape[$z1]\n" );
        open( TEMPFILE, "$gsi_info[$z]" );

        $comptext = <TEMPFILE>;

        while ( $comptext =~ /$toscrape[z1]/ig ) {    # This loop fetches data from the search terms

            print( "Now scraping $gsi_info[$z] for $toscrape[$z1]\n" );
            print OUTPUT ( "$toscrape[$z1]\n" );

            $z1++;
        }

        close( TEMPFILE );

        $z++;
    }

    close( OUTPUT );

    return ( @gsi_info );
}

#/usr/bin/perl
#-----------------------------------------------------------------------
#如果我理解正确，这个脚本应该可以作为CGI脚本使用。
#大多数用于Perl的CGI脚本都以同一行开头，并且必须是
#存储在您的服务器cgi bin目录中。（我认为这是由
#您的web服务器。
#
#此脚本将从新闻站点中搜寻有关主题输入的故事
#由用户提供。
#
#劳拉·兰迪斯
#邪恶鼠海豚
# 1/4/2018
#个人Perl项目
#-----------------------------------------------------------------------
@全球_站点=（）；
打印（“启动程序。\n”）；
如果（！（-e“sitedata.txt”））{
输入站点信息（@global\u sites）；
}
如果（！（-e“scrpdata.txt”））{
打印（“scrpdata.txt不存在。正在创建文件。\n”）；
打印（“在下面输入要搜索的搜索词。按Ctrl-D完成。\n”）；
打开（scrapfile，“>scrpdata.txt”）；
而（$line=）{
印章（行）；
打印文件（“$line\n”）；
}
关闭（删除文件）；
}
打印（“已完成获取站点数据…”）；
刮取站点（@global\u站点）；
#----------------------------------------------------------------------
#此例程在文件被删除后从用户处获取信息
#它还进行了一些基本检查，以确保
#提供给它的是合法的域名。这不是一个完整的列表
#存在的所有域。
#----------------------------------------------------------------------
子输入站点信息{
我的（@sisite）=；
$x=1；
打开（数据文件“>sitedata.txt”）||死（“无法打开数据文件。\n”）；
打印（“在下面输入网站。按Crtl-D完成。\n”）；
虽然（$x当您检查一些时，您并没有检查所有的打开的或系统的调用。如果它们失败，程序将继续运行，并且不会显示错误消息告诉您原因
您可以向所有这些中添加检查，但很容易忘记。相反，请使用来为您执行检查
您还需要使用strict
来确保没有任何变量输入错误，并使用warnings
来警告您一些小错误。请参阅
另外，@global\u sites
为空，因此enter\u site\u info（）
不会执行任何操作。而scrape\u sites（）
对其参数没有任何作用，@ss_info
您对当前工作目录的假设通常是不正确的。您似乎假设当前工作目录是脚本所在的目录，但这永远无法保证，对于CGI脚本，这通常是/

"sitedata.txt"

应该是
use FindBin qw( $RealBin );

"$RealBin/sitedata.txt"

可能还有权限错误。您应该包括错误原因（$！
）当打开
失败时，在您的错误消息中，这样您就知道问题的原因了！
所有这些都很有帮助。谢谢。我发现了问题。我打开了错误的文件。正是将错误检入到文件中让我发现了错误。应该是这样的
打开（TEMPFILE，“index.html”）| |死（“无法打开index.html\n”）
我已经采纳了我记忆中的许多建议，并将它们包含在代码中。我仍然需要实现目录建议，但这应该不难。
您实际上在哪里向output.html
？它不在pl中的任何位置。更重要的是，列表是按值传递的，因此enter\u site\u info
不会改变它的参数的值。你没有刮去任何东西。如果不是没有&
的子例程调用和我的声明，我会假设这是1992年左右的每4个代码。从使用严格开始；使用警告；
并修复所有问题。你应该避免chop
chomp
非常安全呃。Perl数组的第一个索引是零。在停止测试代码之前，您编写的代码太多了。在运行程序之前，您应该一次只写几行，以确保它能正常工作。在测试之前，您当然不应该接近完成程序，除非它是一段微不足道的代码，只是一个l一行或两行。您应该编写代码，这样就不需要注释了。只有特别棘手的技术才需要注释。@Mike:试着滚动代码