为什么我的Perl脚本在Windows上使用大文件生成损坏的输出?
我是Perl新手,遇到了一个非常奇怪的问题 Perl程序在Windows XP上运行。它首先执行SQL,然后通过5个子例程循环结果并输出到5个文件。这5个文件将加载到数据库中,因此它使用为什么我的Perl脚本在Windows上使用大文件生成损坏的输出?,perl,delimiter,linefeed,Perl,Delimiter,Linefeed,我是Perl新手,遇到了一个非常奇怪的问题 Perl程序在Windows XP上运行。它首先执行SQL,然后通过5个子例程循环结果并输出到5个文件。这5个文件将加载到数据库中,因此它使用作为分隔符。每个子程序都有如下内容 打印输出文件$array[field1]。|'$数组[field2]。|'$数组[field3]。“\n” 奇怪的是,有时程序输出正常。有时,输出损坏,例如,某个点后缺少换行符,或者数组中的值不正确 我想知道这是否与记忆有关。输出文件大小从500MB到9GB不等。程序一次读取一
作为分隔符。每个子程序都有如下内容
打印输出文件$array[field1]。|'$数组[field2]。|'$数组[field3]。“\n”代码>
奇怪的是,有时程序输出正常。有时,输出损坏,例如,某个点后缺少换行符,或者数组中的值不正确
我想知道这是否与记忆有关。输出文件大小从500MB到9GB不等。程序一次读取一条SQL记录的输出,一次写入一条记录
下面是完整的Perl脚本
#!/usr/bin/perl
use DBI;
use DBD::Oracle;
# Constants:
use constant field0 => 0;
use constant field1 => 1;
use constant field2 => 2;
use constant field3 => 3;
use constant field4 => 4;
use constant field5 => 5;
use constant field6 => 6;
use constant field7 => 7;
use constant field8 => 8;
use constant field9 => 9;
use constant field10 => 10;
use constant field11 => 11;
use constant field12 => 12;
use constant field13 => 13;
use constant field14 => 14;
use constant field15 => 15;
use constant field16 => 16;
use constant field17 => 17;
use constant field18 => 18;
use constant field19 => 19;
use constant field20 => 20;
use constant field21 => 21;
use constant field22 => 22;
use constant field23 => 23;
use constant field24 => 24;
use constant field25 => 25;
use constant field26 => 26;
use constant field27 => 27;
use constant field28 => 28;
use constant field29 => 29;
use constant field30 => 30;
use constant field31 => 31;
use constant field32 => 32;
use constant field33 => 33;
use constant field34 => 34;
use constant field35 => 35;
use constant field36 => 36;
use constant field37 => 37;
use constant field38 => 38;
use constant field39 => 39;
use constant field40 => 40;
use constant field41 => 41;
# Capture Directory Path from Environment Variable:
my $DIRECTORY = $ENV{DATADIR};
# Process Counters:
my %fileCntr = (
ccr1 => 0,
ccr2 => 0,
ccr3 => 0,
ccr4 => 0,
ccr5 => 0
);
# Process Control Hashes:
my %xref = ();
# Process Control Variables:
my $diag = 0;
my $proc = 0;
my $ndcc = 0;
my $previous = "";
# Claims Extract array:
my @arr = ();
my $hdr = "";
# Accept/Parse DSS Connection String:
$ENV{PSWD} =~ /(.+)\/(.+)\@(.+)/;
my $USER = $1;
my $PASS = $2;
my $CONN = 'DBI:Oracle:' . $3;
# ALTER Date format:
my $ATL = qq(ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD');
# Database Connection:
my $dbh = DBI->connect( $CONN, $USER, $PASS, { RaiseError => 1, AutoCommit => 0 } );
$dbh->do($ATL); # Execute ALTER session.
my $SQL = qq(
SELECT ... here is a big sql query
);
# Open OUTPUT file for CCR processing:
open OUT1, ">$DIRECTORY/ccr1.dat" or die "Unable to open OUT1 file: $!\n";
open OUT2, ">$DIRECTORY/ccr2.dat" or die "Unable to open OUT2 file: $!\n";
open OUT3, ">$DIRECTORY/ccr3.dat" or die "Unable to open OUT3 file: $!\n";
open OUT4, ">$DIRECTORY/ccr4.dat" or die "Unable to open OUT4 file: $!\n";
open OUT5, ">$DIRECTORY/ccr5.dat" or die "Unable to open OUT5 file: $!\n";
# Redirect STDOUT to log file:
open STDOUT, ">$DIRECTORY/ccr.log" or die "Unable to open LOG file: $!\n";
# Prepare $SQL for execution:
my $sth = $dbh->prepare($SQL);
$sth->execute();
# Produce out files:
{
local $, = "|";
local $\ = "\n";
while (@arr = $sth->fetchrow_array)
{
# Direct Write of CCR1&2 records:
&BuildCCR12();
# Write and Wipe CCR3 HASH Table:
&WriteCCR3() unless ($arr[field0] == $previous);
&BuildCCR3();
# Loop processing for CCR4:
&BuildCCR4();
# Loop processing for CCR5:
&BuildCCR5();
}
}
# Print Record Counts for OUTPUT files:
foreach my $key (keys %fileCntr) { print "$key: " . $fileCntr{$key} . "\n"; }
# Terminate DB connection:
$sth->finish();
$dbh->disconnect();
# Close all output files:
close(OUT1); close(OUT2); close(OUT3);
close(OUT4); close(OUT5);
{
# Reassign Output End-of-record across subroutine block:
local $\ = "\n";
sub BuildCCR12
{
# Write CCR1 Table:
print OUT1 $arr[field6] . '|' . $arr[field7] . '|' . $arr[field5] . '|' .
$arr[field0] . '|' . $arr[field8] . '|' . $arr[field9] . '|' .
$arr[field10] . '|' . $arr[field11] . '|' . $arr[field12] . '|' .
$arr[field13] . '|' . $arr[field2] . '|' . $arr[field3] . '|' .
$arr[field40] . '|' . $arr[field16];
$fileCntr{ccr1}++;
# Write CCR2 Table:
unless ($arr[field17] eq '###########') {
print OUT2 ++$ndcc . "|" . $arr[field0] . "|" .
$arr[field6] . '|' . $arr[field7] . '|' .
$arr[field17] . '|' . $arr[field19] . '|' . $arr[field18] . '|' .
$arr[field2] . '|' . $arr[field3] . '|' . $arr[field39];
$fileCntr{ccr2}++;
}
}
sub WriteCCR3
{
unless ($previous == "")
{
# Produce ccr3 from DISTINCT combo listing:
foreach $key (keys %xref) { print OUT3 $xref{$key}; $fileCntr{ccr3}++; }
%xref = ();
}
}
sub BuildCCR3
{
# Spin off relationship:
for (my $i = field8; $i <= field13; $i++)
{
unless ($arr[$i] == -1)
{
$xref{$arr[field0] . "|" . $arr[$i]} = $arr[field0] . "|" . $arr[$i];
}
}
$previous = $arr[field0];
}
sub BuildCCR4
{
# Spin off relationship:
for (my $i = field26; $i <= field37; $i++)
{
my $sak = $arr[field0] . $arr[field6] . $arr[field7] . $arr[$i];
unless (($arr[$i] eq '#######') or ($arr[$i] eq '######')) {
print OUT4 ++$diag . '|' . $arr[field0] . '|' .
$arr[field6] . '|' .
$arr[field7] . '|' . $arr[$i];
$fileCntr{ccr4}++;
}
}
}
sub BuildCCR5
{
# Spin off field0/Procedure relationship:
for (my $i = field20; $i <= field23; $i++)
{
my $sak = $arr[field0] . $arr[field6] . $arr[field7] . $arr[$i];
unless ($arr[$i] eq '######' or $arr[$i] eq '####') {
print OUT5 ++$proc . '|' . $arr[field0] . '|' . $arr[field6] . '|' .
$arr[field7] . '|' . $arr[$i];
$fileCntr{ccr5}++;
}
}
}
}
另一件事是,这个程序将运行近26个小时,当通过sql循环时,数据是否有可能被弄乱?但它仍然无法解释为什么换行器突然停止工作。我试图减少混乱。首先,您定义的常量会造成大量混乱,而不是帮助提高可读性。如果你有
use constant LICENSE_NO => 42;
我可以理解,但是如果常数只是对应于整数数组索引,那么我就看不出有什么意义
我还将所有打印放在一个单独的子例程中,并在print
和close
语句中添加了错误检查
我并不声称这是解决您问题的方法,但我将从这里开始实际调试。这里可能有一些拼写错误,所以要小心
#!/usr/bin/perl
use warnings; use strict;
use DBI;
use File::Spec::Functions qw( catfile );
my @proc = qw(ccr1 ccr2 ccr3 ccr4 ccr5);
# Capture Directory Path from Environment Variable:
my $DIRECTORY = $ENV{DATADIR};
# Process Counters:
my %fileCntr = map { $_ => 0 } @proc;
# Process Control Hashes:
my %xref = ();
# Process Control Variables:
my $diag = 0;
my $proc = 0;
my $ndcc = 0;
my $previous = "";
# Claims Extract array:
my @arr = ();
my $hdr = "";
# Accept/Parse DSS Connection String:
my ($USER, $PASS, $CONN) = ($ENV{PSWD} =~ m{^(.+)/(.+)\@(.+)});
# ALTER Date format:
my $ATL = qq(ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD');
# Database Connection:
my $dbh = DBI->connect(
"DBI::Oracle:$CONN", $USER, $PASS,
{ RaiseError => 1, AutoCommit => 0 },
);
$dbh->do($ATL); # Execute ALTER session.
my $SQL = qq(
SELECT ... here is a big sql query
);
my %outh;
for my $proc ( @proc ) {
my $fn = catfile $DIRECTORY, "$proc.dat";
open $outh{ $proc }, '>', $fn
or die "Cannot open '$fn' for writing: $!";
}
# Redirect STDOUT to log file:
open STDOUT, '>', catfile($DIRECTORY, 'ccr.log')
or die "Unable to open LOG file: $!";
# Prepare $SQL for execution:
my $sth = $dbh->prepare($SQL);
$sth->execute();
# Produce out files:
while (my @arr = $sth->fetchrow_array) {
# Direct Write of CCR1&2 records:
BuildCCR12(\@arr);
# Write and Wipe CCR3 HASH Table:
WriteCCR3(\@arr) unless ($arr[0] == $previous);
BuildCCR3(\@arr);
# Loop processing for CCR4:
BuildCCR4(\@arr);
# Loop processing for CCR5:
BuildCCR5(\@arr);
}
# Print Record Counts for OUTPUT files:
foreach my $key (keys %fileCntr) {
printf "%s: %s\n", $key, $fileCntr{$key};
}
# Terminate DB connection:
$sth->finish();
$dbh->disconnect();
for my $proc (keys %outh) {
close $outh{ $proc } or die "Cannot close filehandle for '$proc': $!";
}
sub print_to {
my ($dest, $data) = @_;
my $fh = $outh{$dest};
print $fh join('|', @$data), "\n"
or die "Error writing to '$dest' file: $!";
$fileCntr{$dest}++;
return;
}
sub BuildCCR12 {
my ($arr) = @_;
print_to(ccr1 =>
[@{$arr}[6, 7, 5, 0, 8, 9, 10, 13, 2, 3, 40, 16]]);
if ($arr->[17] ne '###########') {
print_to(ccr2 =>
[++$ndcc, @{ $arr }[0, 6, 7, 17, 19, 18, 2, 3, 39]]);
}
return;
}
sub WriteCCR3 {
my ($arr) = @_;
unless ($previous) {
# Produce ccr3 from DISTINCT combo listing:
print_to(ccr3 => [ keys %xref ]);
%xref = ();
}
return;
}
sub BuildCCR3 {
my ($arr) = @_;
# Spin off relationship:
for my $i (8 .. 13) {
unless ($arr->[$i] == -1) {
my $k = join '|', @{ $arr }[0, $i];
$xref{ $k } = $k;
}
}
$previous = $arr->[0];
return;
}
sub BuildCCR4 {
my ($arr) = @_;
# Spin off relationship:
for my $i (26 .. 37) {
my $sak = join '|', @{ $arr }[0, 6, 7, $i];
my $v = $arr->[$i];
unless ( $v =~ /^#{6,7}\z/ ) {
print_to(ccr4 => [++$diag, @{ $arr }[0, 6, 7, $v]]);
}
}
return;
}
sub BuildCCR5 {
my ($arr) = @_;
# Spin off field0/Procedure relationship:
for my $i (20 .. 23) {
my $v = $arr[$i];
my $sak = join('', @{ $arr }[0, 6, 7], $v);
unless ($v eq '######' or $v eq '####') {
print_to(ccr5 => [++$proc, @{ $arr }[0, 6, 7], $v]);
}
}
return;
}
我试图减少混乱。首先,您定义的常量会造成大量混乱,而不是帮助提高可读性。如果你有
use constant LICENSE_NO => 42;
我可以理解,但是如果常数只是对应于整数数组索引,那么我就看不出有什么意义
我还将所有打印放在一个单独的子例程中,并在print
和close
语句中添加了错误检查
我并不声称这是解决您问题的方法,但我将从这里开始实际调试。这里可能有一些拼写错误,所以要小心
#!/usr/bin/perl
use warnings; use strict;
use DBI;
use File::Spec::Functions qw( catfile );
my @proc = qw(ccr1 ccr2 ccr3 ccr4 ccr5);
# Capture Directory Path from Environment Variable:
my $DIRECTORY = $ENV{DATADIR};
# Process Counters:
my %fileCntr = map { $_ => 0 } @proc;
# Process Control Hashes:
my %xref = ();
# Process Control Variables:
my $diag = 0;
my $proc = 0;
my $ndcc = 0;
my $previous = "";
# Claims Extract array:
my @arr = ();
my $hdr = "";
# Accept/Parse DSS Connection String:
my ($USER, $PASS, $CONN) = ($ENV{PSWD} =~ m{^(.+)/(.+)\@(.+)});
# ALTER Date format:
my $ATL = qq(ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD');
# Database Connection:
my $dbh = DBI->connect(
"DBI::Oracle:$CONN", $USER, $PASS,
{ RaiseError => 1, AutoCommit => 0 },
);
$dbh->do($ATL); # Execute ALTER session.
my $SQL = qq(
SELECT ... here is a big sql query
);
my %outh;
for my $proc ( @proc ) {
my $fn = catfile $DIRECTORY, "$proc.dat";
open $outh{ $proc }, '>', $fn
or die "Cannot open '$fn' for writing: $!";
}
# Redirect STDOUT to log file:
open STDOUT, '>', catfile($DIRECTORY, 'ccr.log')
or die "Unable to open LOG file: $!";
# Prepare $SQL for execution:
my $sth = $dbh->prepare($SQL);
$sth->execute();
# Produce out files:
while (my @arr = $sth->fetchrow_array) {
# Direct Write of CCR1&2 records:
BuildCCR12(\@arr);
# Write and Wipe CCR3 HASH Table:
WriteCCR3(\@arr) unless ($arr[0] == $previous);
BuildCCR3(\@arr);
# Loop processing for CCR4:
BuildCCR4(\@arr);
# Loop processing for CCR5:
BuildCCR5(\@arr);
}
# Print Record Counts for OUTPUT files:
foreach my $key (keys %fileCntr) {
printf "%s: %s\n", $key, $fileCntr{$key};
}
# Terminate DB connection:
$sth->finish();
$dbh->disconnect();
for my $proc (keys %outh) {
close $outh{ $proc } or die "Cannot close filehandle for '$proc': $!";
}
sub print_to {
my ($dest, $data) = @_;
my $fh = $outh{$dest};
print $fh join('|', @$data), "\n"
or die "Error writing to '$dest' file: $!";
$fileCntr{$dest}++;
return;
}
sub BuildCCR12 {
my ($arr) = @_;
print_to(ccr1 =>
[@{$arr}[6, 7, 5, 0, 8, 9, 10, 13, 2, 3, 40, 16]]);
if ($arr->[17] ne '###########') {
print_to(ccr2 =>
[++$ndcc, @{ $arr }[0, 6, 7, 17, 19, 18, 2, 3, 39]]);
}
return;
}
sub WriteCCR3 {
my ($arr) = @_;
unless ($previous) {
# Produce ccr3 from DISTINCT combo listing:
print_to(ccr3 => [ keys %xref ]);
%xref = ();
}
return;
}
sub BuildCCR3 {
my ($arr) = @_;
# Spin off relationship:
for my $i (8 .. 13) {
unless ($arr->[$i] == -1) {
my $k = join '|', @{ $arr }[0, $i];
$xref{ $k } = $k;
}
}
$previous = $arr->[0];
return;
}
sub BuildCCR4 {
my ($arr) = @_;
# Spin off relationship:
for my $i (26 .. 37) {
my $sak = join '|', @{ $arr }[0, 6, 7, $i];
my $v = $arr->[$i];
unless ( $v =~ /^#{6,7}\z/ ) {
print_to(ccr4 => [++$diag, @{ $arr }[0, 6, 7, $v]]);
}
}
return;
}
sub BuildCCR5 {
my ($arr) = @_;
# Spin off field0/Procedure relationship:
for my $i (20 .. 23) {
my $v = $arr[$i];
my $sak = join('', @{ $arr }[0, 6, 7], $v);
unless ($v eq '######' or $v eq '####') {
print_to(ccr5 => [++$proc, @{ $arr }[0, 6, 7], $v]);
}
}
return;
}
你能复制/粘贴更多的代码吗?具体地说,从打开()开始和关闭()结束的位置,为什么是field1
,而不是$field1
?field1
是否定义为某个常量?否则,$array[field1]
中的field1
将被视为“裸字”,并解析为$array[0]
。是。字段1在开头被定义为一个常量,我在这里没有显示它。有趣的是,程序运行了一段时间,现在突然开始出现输出问题。程序或环境没有变化。我唯一能想到的可能是文件太大了。你的错误描述很模糊,你显示的代码是通用的,没有显示任何可以解决问题的东西。向我们展示SUB、应该的输出以及已损坏的输出。print语句周围的代码也非常重要。您使用的是什么版本的perl(perl-v
查看)。它是否支持大文件(即大于2gigs)?您可以通过运行perl-wle“useconfig;print$Config{uselagefiles}”
进行检查。如果有,应该说“定义”。最后,您使用的是什么文件系统:NTFS或FAT32?您可以复制/粘贴更多的代码吗?具体地说,从打开()开始和关闭()结束的位置,为什么是field1
,而不是$field1
?field1
是否定义为某个常量?否则,$array[field1]
中的field1
将被视为“裸字”,并解析为$array[0]
。是。字段1在开头被定义为一个常量,我在这里没有显示它。有趣的是,程序运行了一段时间,现在突然开始出现输出问题。程序或环境没有变化。我唯一能想到的可能是文件太大了。你的错误描述很模糊,你显示的代码是通用的,没有显示任何可以解决问题的东西。向我们展示SUB、应该的输出以及已损坏的输出。print语句周围的代码也非常重要。您使用的是什么版本的perl(perl-v
查看)。它是否支持大文件(即大于2gigs)?您可以通过运行perl-wle“useconfig;print$Config{uselagefiles}”
进行检查。如果有,应该说“定义”。最后,您使用的是什么文件系统:NTFS还是FAT32?谢谢您的回复。对$数组[field1]和$previous是数字。我没有在这里展示。如何启用严格和警告?正如@mob所问的,这应该是$array[$field1]
<代码>使用严格代码>和使用警告代码>将在当前词法范围内启用它们。要全局启用警告,请添加#!perl-w
作为脚本的第一行。field1声明为常量,即使用常量field1=>0;我将尝试警告。ThanksI启用了警告和严格,它确实返回了一些错误/警告。例如,在writecr3中,“foreach$key”缺少“my”。它还抱怨$previous==“”,因此我将$previous初始化从“”更改为0。在我进行了上述所有更改之后,就没有问题了。我高度怀疑是书面文件中缺少的“my”导致了问题。我真的很感谢大家的帮助。谢谢回复。对$数组[field1]和$previous是数字。我没有在这里展示。如何启用严格和警告?正如@mob所问的,这应该是$array[$field1]
<代码>使用严格代码>和使用警告代码>将在当前词法范围内启用它们。要全局启用警告,请添加#!perl-w
as t