如何在Perl中读取文件末尾的行？_Perl_Large Files

如何在Perl中读取文件末尾的行？

perl

如何在Perl中读取文件末尾的行？,perl,large-files,Perl,Large Files,我正在编写一个Perl脚本来读取CSV文件并进行一些计算。 CSV文件只有两列，如下所示 One Two 1.00 44.000 3.00 55.000 现在这个CSV文件非常大，可以从10MB到2GB 目前，我采取的CSV文件大小700 MB。我试图在记事本excel中打开此文件，但似乎没有软件可以打开它我想从CSV文件中读取最后1000行，并查看值。我该怎么做？我无法在记事本或任何其他程序中打开文件若我写一个Perl脚本，那个么我需要处理完整的文件，转到文件的末尾，然后读取最后100

我正在编写一个Perl脚本来读取CSV文件并进行一些计算。 CSV文件只有两列，如下所示

One Two
1.00 44.000
3.00 55.000

现在这个CSV文件非常大，可以从10MB到2GB

目前，我采取的CSV文件大小700 MB。我试图在记事本excel中打开此文件，但似乎没有软件可以打开它

我想从CSV文件中读取最后1000行，并查看值。我该怎么做？我无法在记事本或任何其他程序中打开文件

若我写一个Perl脚本，那个么我需要处理完整的文件，转到文件的末尾，然后读取最后1000行

有没有更好的办法？我是Perl新手，如有任何建议，将不胜感激

我已经搜索了net，有一些脚本可用，例如，但我不知道它们是否能在windows上工作？

在*nix中，您可以使用tail命令

tail -1000 yourfile | perl ...

这将只向perl程序写入最后1000行

在Windows上，有和包都有

tail

实用程序。

我相信您可以使用Tie:：File module。这看起来像是将行加载到一个数组中，然后您可以获得数组的大小，并将arrayS-ze-1000处理为arraySize-1

另一个选项是计算文件中的行数，然后在文件中循环一次，然后开始读取numberofLines-1000处的值

$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);

$count=`wc-l<$file`；
模具“wc失败：$？”如果$？；
咀嚼（计数）；

这将为您提供行数（在大多数系统上）。

如果您知道文件中的行数，您可以这样做

perl -ne "print if ($. > N);" filename.csv

其中N是文件中的$num\u行-$num\u行到打印。你可以用手指数数

perl -e "while (<>) {} print $.;" filename.csv

perl-e“while（）{}print$.；”filename.csv

perl-n-e“shift@d if（@d>=1000）；push（@d，$）；END{print@d}”


尽管事实上，UNIX系统可以简单地tail-n1000
这一事实应该说服您简单地安装或不使用tail，但仅使用Perl的解决方案并没有那么不合理
一种方法是从文件末尾开始查找，然后从中读取行。如果行数不够，请从文件末尾进一步查找，然后重试
sub last_x_lines {
    my ($filename, $lineswanted) = @_;
    my ($line, $filesize, $seekpos, $numread, @lines);

    open F, $filename or die "Can't read $filename: $!\n";

    $filesize = -s $filename;
    $seekpos = 50 * $lineswanted;
    $numread = 0;

    while ($numread < $lineswanted) {
        @lines = ();
        $numread = 0;
        seek(F, $filesize - $seekpos, 0);
        <F> if $seekpos < $filesize; # Discard probably fragmentary line
        while (defined($line = <F>)) {
            push @lines, $line;
            shift @lines if ++$numread > $lineswanted;
        }
        if ($numread < $lineswanted) {
            # We didn't get enough lines. Double the amount of space to read from next time.
            if ($seekpos >= $filesize) {
                die "There aren't even $lineswanted lines in $filename - I got $numread\n";
            }
            $seekpos *= 2;
            $seekpos = $filesize if $seekpos >= $filesize;
        }
    }
    close F;
    return @lines;
}

sub last_x_行{
我的（$filename，$lineswanted）=@；
我的（$line，$filesize，$seekpos，$numread，@lines）；
打开F、$filename或“无法读取$filename:$！\n”；
$filesize=-s$filename；
$seekpos=50*$lineswanted；
$numread=0；
而（$numread<$lineswanted）{
@行=（）；
$numread=0；
seek（F，$filesize-$seekpos，0）；
如果$seekpos<$filesize；#丢弃可能是零碎的行
while（已定义（$line=））{
按@行，$line；
如果++$numread>$lineswanted，则shift@lines；
}
如果（$numread<$lineswanted）{
#我们没有足够的行。请将下次读取的空间增加一倍。
如果（$seekpos>=$filesize）{
die“在$filename中甚至没有$lineswanted行-我得到了$numread\n”；
}
$seekpos*=2；
$seekpos=$filesize，如果$seekpos>=$filesize；
}
}
关闭F；
返回@行；
}

另外，更好的标题应该是“在Perl中读取大文件末尾的行”。
该模块允许您以相反的顺序读取文件。只要您不依赖于顺序，就可以轻松获取最后N行。如果您是，并且所需的数据足够小（在您的情况下应该是这样）您可以将最后1000行读取到一个数组中，然后反转它。
无需依赖tail，我可能会这样做，如果您的内存超过$FILESIZE[2GB？]，那么我就懒得做：
my @lines = <>;
my @lastKlines = @lines[-1000,-1];

my@lines=；
我的@lastKlines=@行[-1000，-1]；

尽管其他答案涉及
tail
或seek（）
基本上就是解决这个问题的方法。
这只与您的主要问题相关，但当您想检查模块（如）是否在您的平台上工作时，请检查结果。中模块页面顶部的链接将引导您


（来源：）
查看矩阵，您会发现，在所有已测试的Perl版本上，该模块在Windows上确实存在问题：


（来源：）
您绝对应该使用File:：Tail，或者更好的另一个模块。它不是脚本，而是一个模块（编程库）。它可能在Windows上工作。正如有人所说，您可以在CPAN测试仪上检查这一点，或者通常只需阅读模块文档或尝试一下
您选择使用tail实用程序作为首选答案，但这在Windows上可能比File:：tail更令人头痛。
我在纯Perl上使用以下代码编写了快速向后文件搜索：
#!/usr/bin/perl 
use warnings;
use strict;
my ($file, $num_of_lines) = @ARGV;

my $count = 0;
my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward
my $offset = -2; # skip two last characters: \n and ^Z in the end of file

open F, $file or die "Can't read $file: $!\n";

while (abs($offset) < $filesize) {
    my $line = "";
    # we need to check the start of the file for seek in mode "2" 
    # as it continues to output data in revers order even when out of file range reached
    while (abs($offset) < $filesize) {
        seek F, $offset, 2;     # because of negative $offset & "2" - it will seek backward
        $offset -= 1;           # move back the counter
        my $char = getc F;
        last if $char eq "\n"; # catch the whole line if reached
        $line = $char . $line; # otherwise we have next character for current line
    }

    # got the next line!
    print $line, "\n";

    # exit the loop if we are done
    $count++;
    last if $count > $num_of_lines;
}

模块是一种方式。但是，有时您可能正在编写一段代码，希望在各种机器上运行，而这些机器可能缺少更晦涩的CPAN模块。在这种情况下，为什么不从Perl中“tail”并将输出转储到临时文件中呢
#!/usr/bin/perl

`tail --lines=1000 /path/myfile.txt > tempfile.txt`

如果安装一个CPAN模块可能会出现问题，那么您就拥有了一些不依赖于CPAN模块的东西。第二条建议。您可以编写自己的seek/read内容，但在一个广泛使用、经过良好测试的CPAN模块中已经为您做了，这是毫无意义的。我为希望获得示例的人编写了一个实现，请参见第页。如果我觉得答案没有帮助/响应性，我会删除它。同意，这是一个相当有效的解决方案，也是有用的信息。+1.好吧，使用tail，就像你不知道的那样。你问过perl，这在perl中起作用。如果有任何理由，它应该被认为是不合适的answ呃，我非常感谢您的评论。我之所以发布此消息，是因为我不喜欢以前的解决方案中的“$seekpos*=2；”方法，即通过完全使用
$ get-x-lines-from-end.pl ./myhugefile.log 200

#!/usr/bin/perl

`tail --lines=1000 /path/myfile.txt > tempfile.txt`