Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/webpack/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
perl中csv文件的比较_Perl_Csv - Fatal编程技术网

perl中csv文件的比较

perl中csv文件的比较,perl,csv,Perl,Csv,我有10-15个csv文件,由id、索引和片段组成 我只想将每个文件的片段列与其他文件进行比较,它应该提供唯一的条目。但在输出中,它还应该打印Id(列:fragment、Id_file1、file2(如果存在,则为1或0)、Id_file2、file2(如果存在,则为1或0)等) 我得到了代码,但它只适用于包含单个列的文件。在这段代码中,输出文件只包含片段列,但没有给出1或0,这意味着其余列为空 文件1 文件2 输出: 代码 使用警告; 使用特征qw(例如); 使用自动模具; 使用Text::c

我有10-15个csv文件,由id、索引和片段组成

我只想将每个文件的片段列与其他文件进行比较,它应该提供唯一的条目。但在输出中,它还应该打印Id(列:fragment、Id_file1、file2(如果存在,则为1或0)、Id_file2、file2(如果存在,则为1或0)等)

我得到了代码,但它只适用于包含单个列的文件。在这段代码中,输出文件只包含片段列,但没有给出1或0,这意味着其余列为空

文件1 文件2 输出: 代码

使用警告;
使用特征qw(例如);
使用自动模具;
使用Text::csvxs;
使用常数{
文件_1=>“1.csv”,
文件_2=>“2.csv”,
文件_3=>“3.csv”,
};
我的%hash;
#
#使用文件#1中的值加载哈希
#
打开我的$file1_fh,“final.csv”)或死亡“无法打开进行写入\n”;
$,=“\n”;
打印出“片段\t1\t2\t3\n”;
打印输出(排序键%hash);
收尾;

我将执行以下操作:

  • 将要解析的文件放入数组(直接在文件范围内或通过读取cmd args)中,因为无法为每个文件复制代码。
    my@files=(“file1”、“file2”、“file3”)
  • 在这个列表上循环,打开每个文件并将其片段添加到一个以字符串为键的散列中,以及一个指向该文件及其索引的结构列表
  • 因此,散列在结尾处如下所示:

    %hash = (
      "abc"  => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
      "pqr"  => [ {fileIdx => 0, id => 12, line => 2, ind => "B"}, 
                  {fileIdx => 1, id => 15, line => 2, ind => "G"}]
    )
    
  • 在这之后,您所要做的就是对散列进行迭代,并对每个键的列表结构进行迭代

  • 为了解决这个问题,您需要改变您的数据结构,因为您需要存储有关文件、片段和片段ID的信息。由于ID随着文件的变化而变化,因此您需要存储与特定文件对应的ID

    前面的脚本使用了一个简单的方法来跟踪哪些文件包含哪些片段。这个脚本需要稍微复杂一点,因为我们从文件中提取更多数据并以不同的方式输出:

    use strict;
    use warnings;
    
    # put our files in an array
    my @files = ('1.csv', '2.csv', '3.csv');
    
    my %hash;
    #
    # Load the Hash with value from File #1
    #
    # since we're doing the same parsing to each file,
    # let's save ourselves some typing and run the same code
    # on each file
    for my $f (@files) {
        open my $fh, "<", $f or die "Could not open $f: $!";
        while (my $val = <$fh>) {
            # skip the first line
            next if $. == 1;
            chomp $val;
            # split the line by the tabs
            my ($id, $ix, $frag) = split(/\t/, $val);
            # store the data in a hash of hashes of hashes
            # keys are the fragment, then the file name
            # I've stored the index and the id, but obviously
            # you can alter this if you have files of a different format
            # and/or want to save different data.
            $hash{$frag}{$f} = { ix => $ix, id => $id };
        }
    }
    
    好的,回到脚本:

    #set up the output file
    my $out;
    open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
    # print out a header row
    # map applies the code within the brackets to every element of @files,
    # so in this case, we're printing out "ID_<array element> \t <array element >"
    # for every file in our list
    # the join joins together items following it using the string "\t" 
    print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";
    
    # now, output our data
    # $frag is the fragment
    for my $frag ( sort keys %hash ) {
        print { $out } "$frag\t";
    #   check which files it appears in
        foreach (@files) {
            # if it exists in that file, print out the ID and '1'
            if ( $hash{$frag}{$_} ) {
                print { $out } $hash{$frag}{$_}{id} . "\t1\t";
            }
            else {
                # print nothing in the ID column, and 0 in the file column
                print { $out } "\t0\t";
            }
        }
        print $out "\n";
    }
    close $out;
    
    #设置输出文件
    我的美元用完了;
    打开($out,“>final.csv”)或死亡“无法打开final.csv进行写入\n”;
    #打印出标题行
    #map将括号内的代码应用于@files的每个元素,
    #所以在这种情况下,我们打印出“ID\t”
    #对于我们列表中的每个文件
    #联接使用字符串“\t”将其后面的项联接在一起
    打印{$out}join(“\t”,“Fragment”,映射{“ID\u$\ ut$\uz}@files)。“\n”;
    #现在,输出我们的数据
    #$frag是碎片
    对于我的$frag(排序键%hash){
    打印{$out}“$frag\t”;
    #检查它出现在哪些文件中
    foreach(@files){
    #如果该文件中存在,请打印出ID和“1”
    if($hash{$frag}{$}){
    打印{$out}$hash{$frag}{${{id}.\t1\t”;
    }
    否则{
    #在ID列中不打印任何内容,在文件列中不打印0
    打印{$out}“\t0\t”;
    }
    }
    打印$out“\n”;
    }
    收尾美元;
    
    它工作正常,但我无法获取片段标题,标题以id\u file1、file1、id\u file2、file2开头。此外,id_文件列为空(id无法打印)。它只打印片段,以及1或0。表示输出类似于id_file1(打印片段列)、file1(空)、id_file2(1或0)、file2(空),没有标题(1或0)。我编辑了代码,因为我忘了添加“片段”列标题--请确保您拥有最新的版本。如果似乎缺少信息,首先检查您想要的数据是否存在于
    %hash
    中——使用
    data::Dump
    data::Dump
    打印数据结构并检查您是否获得了ID。如果您阅读了我的答案,您将看到带有$x和$y的代码只是作为示例,而不仅仅是盲目地粘贴到脚本中。请阅读我写的评论,您应该能够解决剩余的任何问题。很抱歉,我收到了延迟消息。知道了。非常感谢您阅读了
    文件1
    文件3
    文件3
    的内容,但显示了
    文件1
    文件2
    的内容。您的计数显示
    id
    ,并且仅显示
    文件1
    文件2
    的计数。真实情况是什么?实际上这个脚本是用于2个文件的,我忘了编辑它。
    use warnings;
    use feature qw(say);
    use autodie;
    use Text::CSV_XS;
    
    use constant {
        FILE_1  => "1.csv",
        FILE_2  => "2.csv",
        FILE_3  => "3.csv",
    };
    
    my %hash;
    #
    # Load the Hash with value from File #1
    #
    open my $file1_fh, "<", FILE_1;
    while ( my $value = <$file1_fh> ) {
        chomp $value;
        $hash{$value}++;
    }
    close $file1_fh;
    #
    # Add File #2 to the Hash
    #
    open my $file2_fh, "<", FILE_2;
    while ( my $value = <$file2_fh> ) {
        chomp $value;
        $hash{$value} += 10;   # if the key already exists, the value will now be 11
                               # if it did not exist, the value will be 10
    }
    close $file2_fh;
    
    open my $file3_fh, "<", FILE_3;
    while ( my $value = <$file3_fh> ) {
        chomp $value;
        $hash{$value} += 100;
    }
    close $file3_fh;
    
    for my $k ( sort keys %hash ) 
    {   if ($hash{$k} == 1) { # only in file 1
            say "$k\t0\t0\t1";
        }
        elsif ($hash{$k} == 10) { # only in file 2
            say "$k\t0\t1\t0";
        }
        elsif ($hash{$k} == 100) { # only in file 2
            say "$k\t1\t0\t0";
        }
        else { # in both file 1 and file 2
            say "$k\t1\t1\t1";
        }
    }
    
    open (OUT, ">final.csv") or die "Cannot open OUT for writing \n";
    $, = " \n";
    print OUT "fragment\t1\t2\t3 \n";
    print OUT (sort keys %hash);
    close OUT;
    
    %hash = (
      "abc"  => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
      "pqr"  => [ {fileIdx => 0, id => 12, line => 2, ind => "B"}, 
                  {fileIdx => 1, id => 15, line => 2, ind => "G"}]
    )
    
    use strict;
    use warnings;
    
    # put our files in an array
    my @files = ('1.csv', '2.csv', '3.csv');
    
    my %hash;
    #
    # Load the Hash with value from File #1
    #
    # since we're doing the same parsing to each file,
    # let's save ourselves some typing and run the same code
    # on each file
    for my $f (@files) {
        open my $fh, "<", $f or die "Could not open $f: $!";
        while (my $val = <$fh>) {
            # skip the first line
            next if $. == 1;
            chomp $val;
            # split the line by the tabs
            my ($id, $ix, $frag) = split(/\t/, $val);
            # store the data in a hash of hashes of hashes
            # keys are the fragment, then the file name
            # I've stored the index and the id, but obviously
            # you can alter this if you have files of a different format
            # and/or want to save different data.
            $hash{$frag}{$f} = { ix => $ix, id => $id };
        }
    }
    
    # get the ID of the fragment $x in 2.csv
    say $hash{$x}{"2.csv"}{id};
    
    # check if fragment $y exists in 3.csv, and print the index if so
    if ( $hash{$y}{"3.csv"} ) {
       say $hash{$y}{"3.csv"}{ix};
    }
    
    #set up the output file
    my $out;
    open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
    # print out a header row
    # map applies the code within the brackets to every element of @files,
    # so in this case, we're printing out "ID_<array element> \t <array element >"
    # for every file in our list
    # the join joins together items following it using the string "\t" 
    print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";
    
    # now, output our data
    # $frag is the fragment
    for my $frag ( sort keys %hash ) {
        print { $out } "$frag\t";
    #   check which files it appears in
        foreach (@files) {
            # if it exists in that file, print out the ID and '1'
            if ( $hash{$frag}{$_} ) {
                print { $out } $hash{$frag}{$_}{id} . "\t1\t";
            }
            else {
                # print nothing in the ID column, and 0 in the file column
                print { $out } "\t0\t";
            }
        }
        print $out "\n";
    }
    close $out;