如何在Perl中同时浏览两个文件？_Perl_File Io

如何在Perl中同时浏览两个文件？

perl file-io

如何在Perl中同时浏览两个文件？,perl,file-io,Perl,File Io,我有两个文本文件，其中包含各种列数据位置-值，按位置排序下面是第一个文件（文件A）的示例：下面是第二个文件（B）的示例：我想做的不是将两个文件中的一个读入哈希表（由于内存限制，这是禁止的），而是以逐步的方式同时遍历两个文件这意味着我想通过A或B的行进行流式处理，并比较位置值如果两个位置相等，那么我将对与该位置相关联的值进行计算否则，如果位置不相等，我将通过文件A或文件B的行移动，直到位置相等（当我再次执行计算时）或达到两个文件的EOF 在Perl中有这样做的方法吗？对于文件循环，您可

我有两个文本文件，其中包含各种列数据

位置

值

，按

位置

排序

下面是第一个文件（文件

）的示例：

下面是第二个文件（

）的示例：

我想做的不是将两个文件中的一个读入哈希表（由于内存限制，这是禁止的），而是以逐步的方式同时遍历两个文件

这意味着我想通过

或

的行进行流式处理，并比较

位置

值

如果两个位置相等，那么我将对与该位置相关联的值进行计算

否则，如果位置不相等，我将通过文件

或文件

的行移动，直到位置相等（当我再次执行计算时）或达到两个文件的EOF

在Perl中有这样做的方法吗？

对于文件循环，您可以使用核心模块。它将一个常规文本文件表示为一个数组。

如果文件已排序，请根据哪个文件的位置较低进行逐步排序

伪代码：

read Apos, Aval from A # initial values
read Bpos, Bval from B 
until eof(A) or eof(B)
  if Apos == Bpos then
    compare()
    read Apos, Aval from A # advance both files to get a new position
    read Bpos, Bval from B
  fi
  if Apos < Bpos then read Apos, Aval from A
  if Bpos < Apos then read Bpos, Bval from B
end

从A#初始值读取Apos、Aval
从B读BPO，Bval
直到eof（A）或eof（B）
如果Apos==Bpos，则
比较（）
从A中读取Apos、Aval，将两个文件升级到新位置
从B读BPO，Bval
fi
如果Apos


您还可以使用来隔离具有公共位置的行，并在空闲时处理这些行。
看起来像是一个可能会遇到的问题，例如，带有键和值的数据库表数据。下面是rjp提供的伪代码的一个实现
#!/usr/bin/perl

use strict;
use warnings;

sub read_file_line {
  my $fh = shift;

  if ($fh and my $line = <$fh>) {
    chomp $line;
    return [ split(/\t/, $line) ];
  }
  return;
}

sub compute {
   # do something with the 2 values
}

open(my $f1, "file1");
open(my $f2, "file2");

my $pair1 = read_file_line($f1);
my $pair2 = read_file_line($f2);

while ($pair1 and $pair2) {
  if ($pair1->[0] < $pair2->[0]) {
    $pair1 = read_file_line($f1);
  } elsif ($pair2->[0] < $pair1->[0]) {
    $pair2 = read_file_line($f2);
  } else {
    compute($pair1->[1], $pair2->[1]);
    $pair1 = read_file_line($f1);
    $pair2 = read_file_line($f2);
  }
}

close($f1);
close($f2);

#/usr/bin/perl
严格使用；
使用警告；
子读取文件行{
我的$fh=班次；
如果（$fh和我的$line=）{
chomp$行；
返回[拆分（/\t/，$line）]；
}
返回；
}
子计算{
#用这两个值做点什么
}
打开（my$f1，“文件1”）；
打开（我的$f2，“文件2”）；
my$pair1=读取文件行（$f1）；
my$pair2=读取文件行（$f2）；
而（$pair1和$pair2）{
如果（$pair1->[0]<$pair2->[0]）{
$pair1=读取文件行（$f1）；
}elsif（$pair2->[0]<$pair1->[0]）{
$pair2=读取文件行（$f2）；
}否则{
计算（$pair1->[1]，$pair2->[1]）；
$pair1=读取文件行（$f1）；
$pair2=读取文件行（$f2）；
}
}
关闭（f1美元）；
收盘价（f2美元）；

希望这有帮助
 这里有一个快速解决方案。如果两个文件中的数据几乎相等（例如，行数相同），则实际上不需要存储在哈希表中。但我认为这会有助于你在数据混乱的情况下
代码：
open（f1，“我们假设里面有一个使用autodie
来检查那些裸露的开口是否有错误。；）这是一个很好的开始，谢谢！一个复杂的问题是，while（$pair1和$pair2）
test将使循环在其中一个文件达到EOF时立即结束。我的问题（如框架所示）使其成为非问题-但是，我确实需要处理其他两个非对相等情况。因此我修改了read_file_line
以返回下一行或当前行，并保留一对布尔值进行检查如果对行已更改。我不测试EOF，而是通过运行read\u file\u line
来测试这两行是否未更改。如果是，那么我可以安全地退出while循环。每个文件中有多少行？内存限制是什么？到目前为止您尝试了什么？有什么比只打开两个文件更微妙的事情吗从每个文件中删除行，等等？创建一个多GB哈希表或将两个文件中的一个读取到内存中的数组中是不可行的-我希望流式处理这两个文件，使用它们的排序属性根据当前位置单步处理其中一个文件。欢迎使用StackOverflow！回答问题时，请尝试简要解释wha谢谢你的评论。我是新来的，所以我认为代码中的评论就足够了。
read Apos, Aval from A # initial values
read Bpos, Bval from B 
until eof(A) or eof(B)
  if Apos == Bpos then
    compare()
    read Apos, Aval from A # advance both files to get a new position
    read Bpos, Bval from B
  fi
  if Apos < Bpos then read Apos, Aval from A
  if Bpos < Apos then read Bpos, Bval from B
end

#!/usr/bin/perl

use strict;
use warnings;

sub read_file_line {
  my $fh = shift;

  if ($fh and my $line = <$fh>) {
    chomp $line;
    return [ split(/\t/, $line) ];
  }
  return;
}

sub compute {
   # do something with the 2 values
}

open(my $f1, "file1");
open(my $f2, "file2");

my $pair1 = read_file_line($f1);
my $pair2 = read_file_line($f2);

while ($pair1 and $pair2) {
  if ($pair1->[0] < $pair2->[0]) {
    $pair1 = read_file_line($f1);
  } elsif ($pair2->[0] < $pair1->[0]) {
    $pair2 = read_file_line($f2);
  } else {
    compute($pair1->[1], $pair2->[1]);
    $pair1 = read_file_line($f1);
    $pair2 = read_file_line($f2);
  }
}

close($f1);
close($f2);

open(f1, "<data1");
open(f2, "<data2");
# initialize hashes
%data1 = ();
%data2 = ();
while(($line1 = <f1>) and ($line2 = <f2>)){
     chomp($line1);
     chomp($line2);
     # split fields 1 and 2 into an array
     @LINE1 = split(/\t/, $line1);
     @LINE2 = split(/\t/, $line2);
     # store data into hashes
     $data1{$LINE1[0]} = $LINE1[1];
     $data2{$LINE2[0]} = $LINE2[1];
     # compare column 2
     if ($data1{$LINE2[0]} == $data2{$LINE1[0]}){
           # compute something
           $new_val = $data1{$LINE2[0]} + $data2{$LINE1[0]};
           print $LINE1[0] . "\t" . $new_val . "\n";
     } else {
           print $LINE1[0] . "\t" . $data1{$LINE1[0]} . "\n";
     }
}