Perl 根据CSV文件中的唯一/重复Id删除/提取行_Perl_Csv

Perl 根据CSV文件中的唯一/重复Id删除/提取行

perl csv

Perl 根据CSV文件中的唯一/重复Id删除/提取行,perl,csv,Perl,Csv,根据您对它的看法，我需要根据Id是否唯一来删除行，或者如果Id有重复项，则提取行以保留所有重复项。我不确定/没有足够的Perl知识来完成这项任务。我找到了类似的主题，但没有太多成功。这些是我正在使用的示例，以及。在前面的一个问题中，有人向我展示了一个List:：MoreUtils模块的解决方案，这样我就可以将值与公共Id合并。现在不是这样，如果Id是唯一的，这一个将删除行。我知道我可能可以使用List:：MoreUtils模块来实现这一点，但我不想使用它。这是我的虚拟数据，从其他问题复制了示例

根据您对它的看法，我需要根据Id是否唯一来删除行，或者如果Id有重复项，则提取行以保留所有重复项。我不确定/没有足够的Perl知识来完成这项任务。我找到了类似的主题，但没有太多成功。这些是我正在使用的示例，以及。在前面的一个问题中，有人向我展示了一个List:：MoreUtils模块的解决方案，这样我就可以将值与公共Id合并。现在不是这样，如果Id是唯一的，这一个将删除行。我知道我可能可以使用List:：MoreUtils模块来实现这一点，但我不想使用它。这是我的虚拟数据，从其他问题复制了示例数据，因为数据不重要，在这里你可以看到我在追求什么。秩序并不重要

之前：

之后：

您可以看到id为50010和50030的葡萄和香蕉行已被删除，因为这两个行只存在一个条目

这是我的脚本，我正在尝试从散列中选择唯一的值，并在考虑Text:：CSV_XS模块的情况下输出它们。有人能告诉我怎么做吗

#!/usr/bin/perl -w
use strict;
use warnings;
use Text::CSV_XS;

my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";

open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die "Sourcefile in use / not found :$!\n";
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die "Outputfile in use :$!\n";

my $csv_in = Text::CSV_XS->new({binary => 1,sep_char => ";",auto_diag => 1,always_quote => 1,eol => $/}); 
my $csv_out = Text::CSV_XS->new({binary => 1,sep_char => "|",auto_diag => 1,always_quote => 1,eol => $/});

my $header = $csv_in->getline($infile);
$csv_out->print($outfile, $header);

my %data;

while (my $elements = $csv_in->getline($infile)){
    my @columns = @{ $elements };       
    my $id = $columns[2];
    push @{ $data{$id} }, \@columns;
}

for my $id ( sort keys %data ){                 # Sort not important
    if @{ $data{$id} } > 1                      # Here I have no idea anymore..
        $csv_out->print($outfile, \@columns);   #
}

与其用整个数据集加载一个哈希，我想我应该继续读两次文件，只加载一个ID值的哈希。这肯定需要更长的时间，但随着文件的增长，将所有这些数据存储在内存中可能会有一些缺点

也就是说，我没有使用Text:：csvxs，但这只是我心中的一个概念

my %count;

open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die;
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die;

while (<$infile>) {
  next if $. == 1;
  my ($id) = (split /;/, $_, 4)[2];
  $count{$id}++;
}

seek $infile, 0, 0;

while (<$infile>) {
  my @fields = split /;/;
  print $outfile join '|', @fields if $count{$fields[2]} > 1 or $. == 1;    
}

close $infile;
close $outfile;

这个问题看起来非常熟悉@索布里克同意了，几乎一样。。我试图解决这个问题，但如果id相同，则合并字段；如果id唯一，则删除行。谢谢您的回答，但我必须使用Text:：CSV_XS模块。这是一个很大的文件，数据中带有分隔符。你对如何使用这个模块有什么建议吗？我认为你剩下的代码很好。。。我只是懒得在概念上描述我将如何做。您可以采用现有的Text：：csvxs，这正是您使用它的方式。我已经修改了对此的响应。感谢您的编辑，但现在它在第27行的哈希元素中使用了未初始化的值，如：它没有要打印的内容。。我遗漏了什么？嗨，Jan。我猜没有看到您的数据，但我最初的猜测是您的第一个文件$infle中有一行是空的，或者没有>=四列数据，这使得$$elements[2]。您是否可以打印到标准输出进行调试，以查看它在输入此错误之前得到多少行？或者，为了调试，公开该行上的每个变量，以查看抛出错误时它们包含的内容，即print$elements$csv\u out$outfile\n。我认为可以肯定地说%count有数据。

#!/usr/bin/perl -w
use strict;
use warnings;
use Text::CSV_XS;

my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";

open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die "Sourcefile in use / not found :$!\n";
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die "Outputfile in use :$!\n";

my $csv_in = Text::CSV_XS->new({binary => 1,sep_char => ";",auto_diag => 1,always_quote => 1,eol => $/}); 
my $csv_out = Text::CSV_XS->new({binary => 1,sep_char => "|",auto_diag => 1,always_quote => 1,eol => $/});

my $header = $csv_in->getline($infile);
$csv_out->print($outfile, $header);

my %data;

while (my $elements = $csv_in->getline($infile)){
    my @columns = @{ $elements };       
    my $id = $columns[2];
    push @{ $data{$id} }, \@columns;
}

for my $id ( sort keys %data ){                 # Sort not important
    if @{ $data{$id} } > 1                      # Here I have no idea anymore..
        $csv_out->print($outfile, \@columns);   #
}

my %count;

open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die;
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die;

while (<$infile>) {
  next if $. == 1;
  my ($id) = (split /;/, $_, 4)[2];
  $count{$id}++;
}

seek $infile, 0, 0;

while (<$infile>) {
  my @fields = split /;/;
  print $outfile join '|', @fields if $count{$fields[2]} > 1 or $. == 1;    
}

close $infile;
close $outfile;

#!/usr/bin/perl -w

use strict;
use warnings;
use Text::CSV_XS;

my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";

open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die;
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die;

my $csv_in = Text::CSV_XS->new({binary => 1,sep_char => ";",
    auto_diag => 1,always_quote => 1,eol => $/}); 
my $csv_out = Text::CSV_XS->new({binary => 1,sep_char => "|",
    auto_diag => 1,always_quote => 1,eol => $/});

my ($count, %count) = (1);

while (my $elements = $csv_in->getline($infile)){
  $count{$$elements[2]}++;
}

seek $infile, 0, 0;

while (my $elements = $csv_in->getline($infile)){
  $csv_out->print($outfile, $elements)
    if $count{$$elements[2]} > 1 or $count++ == 1;
}

close $infile;
close $outfile;