perl中csv文件的比较_Perl_Csv - Fatal编程技术网

perl中csv文件的比较

perl csv

perl中csv文件的比较,perl,csv,Perl,Csv,我有10-15个csv文件，由id、索引和片段组成我只想将每个文件的片段列与其他文件进行比较，它应该提供唯一的条目。但在输出中，它还应该打印Id（列：fragment、Id_file1、file2（如果存在，则为1或0）、Id_file2、file2（如果存在，则为1或0）等）我得到了代码，但它只适用于包含单个列的文件。在这段代码中，输出文件只包含片段列，但没有给出1或0，这意味着其余列为空文件1 文件2 输出：代码使用警告；使用特征qw（例如）；使用自动模具；使用Text:：c

我有10-15个csv文件，由id、索引和片段组成

我只想将每个文件的片段列与其他文件进行比较，它应该提供唯一的条目。但在输出中，它还应该打印Id（列：fragment、Id_file1、file2（如果存在，则为1或0）、Id_file2、file2（如果存在，则为1或0）等）

我得到了代码，但它只适用于包含单个列的文件。在这段代码中，输出文件只包含片段列，但没有给出1或0，这意味着其余列为空

文件1 文件2 输出：代码

使用警告；
使用特征qw（例如）；
使用自动模具；
使用Text:：csvxs；
使用常数{
文件_1=>“1.csv”，
文件_2=>“2.csv”，
文件_3=>“3.csv”，
};
我的%hash；
#
#使用文件#1中的值加载哈希
#
打开我的$file1_fh，“final.csv”）或死亡“无法打开进行写入\n”；
$，=“\n”；
打印出“片段\t1\t2\t3\n”；
打印输出（排序键%hash）；
收尾；

我将执行以下操作：

将要解析的文件放入数组（直接在文件范围内或通过读取cmd args）中，因为无法为每个文件复制代码。

my@files=（“file1”、“file2”、“file3”）


在这个列表上循环，打开每个文件并将其片段添加到一个以字符串为键的散列中，以及一个指向该文件及其索引的结构列表
因此，散列在结尾处如下所示：
%hash = (
  "abc"  => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
  "pqr"  => [ {fileIdx => 0, id => 12, line => 2, ind => "B"}, 
              {fileIdx => 1, id => 15, line => 2, ind => "G"}]
)

在这之后，您所要做的就是对散列进行迭代，并对每个键的列表结构进行迭代
为了解决这个问题，您需要改变您的数据结构，因为您需要存储有关文件、片段和片段ID的信息。由于ID随着文件的变化而变化，因此您需要存储与特定文件对应的ID
前面的脚本使用了一个简单的方法来跟踪哪些文件包含哪些片段。这个脚本需要稍微复杂一点，因为我们从文件中提取更多数据并以不同的方式输出：
use strict;
use warnings;

# put our files in an array
my @files = ('1.csv', '2.csv', '3.csv');

my %hash;
#
# Load the Hash with value from File #1
#
# since we're doing the same parsing to each file,
# let's save ourselves some typing and run the same code
# on each file
for my $f (@files) {
    open my $fh, "<", $f or die "Could not open $f: $!";
    while (my $val = <$fh>) {
        # skip the first line
        next if $. == 1;
        chomp $val;
        # split the line by the tabs
        my ($id, $ix, $frag) = split(/\t/, $val);
        # store the data in a hash of hashes of hashes
        # keys are the fragment, then the file name
        # I've stored the index and the id, but obviously
        # you can alter this if you have files of a different format
        # and/or want to save different data.
        $hash{$frag}{$f} = { ix => $ix, id => $id };
    }
}

好的，回到脚本：
#set up the output file
my $out;
open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
# print out a header row
# map applies the code within the brackets to every element of @files,
# so in this case, we're printing out "ID_<array element> \t <array element >"
# for every file in our list
# the join joins together items following it using the string "\t" 
print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";

# now, output our data
# $frag is the fragment
for my $frag ( sort keys %hash ) {
    print { $out } "$frag\t";
#   check which files it appears in
    foreach (@files) {
        # if it exists in that file, print out the ID and '1'
        if ( $hash{$frag}{$_} ) {
            print { $out } $hash{$frag}{$_}{id} . "\t1\t";
        }
        else {
            # print nothing in the ID column, and 0 in the file column
            print { $out } "\t0\t";
        }
    }
    print $out "\n";
}
close $out;

#设置输出文件
我的美元用完了；
打开（$out，“>final.csv”）或死亡“无法打开final.csv进行写入\n”；
#打印出标题行
#map将括号内的代码应用于@files的每个元素，
#所以在这种情况下，我们打印出“ID\t”
#对于我们列表中的每个文件
#联接使用字符串“\t”将其后面的项联接在一起
打印{$out}join（“\t”，“Fragment”，映射{“ID\u$\ ut$\uz}@files）。“\n”；
#现在，输出我们的数据
#$frag是碎片
对于我的$frag（排序键%hash）{
打印{$out}“$frag\t”；
#检查它出现在哪些文件中
foreach（@files）{
#如果该文件中存在，请打印出ID和“1”
if（$hash{$frag}{$}）{
打印{$out}$hash{$frag}{${{id}.\t1\t”；
}
否则{
#在ID列中不打印任何内容，在文件列中不打印0
打印{$out}“\t0\t”；
}
}
打印$out“\n”；
}
收尾美元；
它工作正常，但我无法获取片段标题，标题以id\u file1、file1、id\u file2、file2开头。此外，id_文件列为空（id无法打印）。它只打印片段，以及1或0。表示输出类似于id_file1（打印片段列）、file1（空）、id_file2（1或0）、file2（空），没有标题（1或0）。我编辑了代码，因为我忘了添加“片段”列标题--请确保您拥有最新的版本。如果似乎缺少信息，首先检查您想要的数据是否存在于%hash
中——使用data:：Dump
或data:：Dump
打印数据结构并检查您是否获得了ID。如果您阅读了我的答案，您将看到带有$x和$y的代码只是作为示例，而不仅仅是盲目地粘贴到脚本中。请阅读我写的评论，您应该能够解决剩余的任何问题。很抱歉，我收到了延迟消息。知道了。非常感谢您阅读了文件1
、文件3
和文件3
的内容，但显示了文件1
和文件2
的内容。您的计数显示id
，并且仅显示文件1
和文件2的计数。真实情况是什么？实际上这个脚本是用于2个文件的，我忘了编辑它。
use warnings;
use feature qw(say);
use autodie;
use Text::CSV_XS;

use constant {
    FILE_1  => "1.csv",
    FILE_2  => "2.csv",
    FILE_3  => "3.csv",
};

my %hash;
#
# Load the Hash with value from File #1
#
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
    chomp $value;
    $hash{$value}++;
}
close $file1_fh;
#
# Add File #2 to the Hash
#
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
    chomp $value;
    $hash{$value} += 10;   # if the key already exists, the value will now be 11
                           # if it did not exist, the value will be 10
}
close $file2_fh;

open my $file3_fh, "<", FILE_3;
while ( my $value = <$file3_fh> ) {
    chomp $value;
    $hash{$value} += 100;
}
close $file3_fh;

for my $k ( sort keys %hash ) 
{   if ($hash{$k} == 1) { # only in file 1
        say "$k\t0\t0\t1";
    }
    elsif ($hash{$k} == 10) { # only in file 2
        say "$k\t0\t1\t0";
    }
    elsif ($hash{$k} == 100) { # only in file 2
        say "$k\t1\t0\t0";
    }
    else { # in both file 1 and file 2
        say "$k\t1\t1\t1";
    }
}

open (OUT, ">final.csv") or die "Cannot open OUT for writing \n";
$, = " \n";
print OUT "fragment\t1\t2\t3 \n";
print OUT (sort keys %hash);
close OUT;

%hash = (
  "abc"  => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
  "pqr"  => [ {fileIdx => 0, id => 12, line => 2, ind => "B"}, 
              {fileIdx => 1, id => 15, line => 2, ind => "G"}]
)

use strict;
use warnings;

# put our files in an array
my @files = ('1.csv', '2.csv', '3.csv');

my %hash;
#
# Load the Hash with value from File #1
#
# since we're doing the same parsing to each file,
# let's save ourselves some typing and run the same code
# on each file
for my $f (@files) {
    open my $fh, "<", $f or die "Could not open $f: $!";
    while (my $val = <$fh>) {
        # skip the first line
        next if $. == 1;
        chomp $val;
        # split the line by the tabs
        my ($id, $ix, $frag) = split(/\t/, $val);
        # store the data in a hash of hashes of hashes
        # keys are the fragment, then the file name
        # I've stored the index and the id, but obviously
        # you can alter this if you have files of a different format
        # and/or want to save different data.
        $hash{$frag}{$f} = { ix => $ix, id => $id };
    }
}

# get the ID of the fragment $x in 2.csv
say $hash{$x}{"2.csv"}{id};

# check if fragment $y exists in 3.csv, and print the index if so
if ( $hash{$y}{"3.csv"} ) {
   say $hash{$y}{"3.csv"}{ix};
}

#set up the output file
my $out;
open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
# print out a header row
# map applies the code within the brackets to every element of @files,
# so in this case, we're printing out "ID_<array element> \t <array element >"
# for every file in our list
# the join joins together items following it using the string "\t" 
print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";

# now, output our data
# $frag is the fragment
for my $frag ( sort keys %hash ) {
    print { $out } "$frag\t";
#   check which files it appears in
    foreach (@files) {
        # if it exists in that file, print out the ID and '1'
        if ( $hash{$frag}{$_} ) {
            print { $out } $hash{$frag}{$_}{id} . "\t1\t";
        }
        else {
            # print nothing in the ID column, and 0 in the file column
            print { $out } "\t0\t";
        }
    }
    print $out "\n";
}
close $out;