perl中csv文件的比较
我有10-15个csv文件,由id、索引和片段组成 我只想将每个文件的片段列与其他文件进行比较,它应该提供唯一的条目。但在输出中,它还应该打印Id(列:fragment、Id_file1、file2(如果存在,则为1或0)、Id_file2、file2(如果存在,则为1或0)等) 我得到了代码,但它只适用于包含单个列的文件。在这段代码中,输出文件只包含片段列,但没有给出1或0,这意味着其余列为空 文件1 文件2 输出: 代码perl中csv文件的比较,perl,csv,Perl,Csv,我有10-15个csv文件,由id、索引和片段组成 我只想将每个文件的片段列与其他文件进行比较,它应该提供唯一的条目。但在输出中,它还应该打印Id(列:fragment、Id_file1、file2(如果存在,则为1或0)、Id_file2、file2(如果存在,则为1或0)等) 我得到了代码,但它只适用于包含单个列的文件。在这段代码中,输出文件只包含片段列,但没有给出1或0,这意味着其余列为空 文件1 文件2 输出: 代码 使用警告; 使用特征qw(例如); 使用自动模具; 使用Text::c
使用警告;
使用特征qw(例如);
使用自动模具;
使用Text::csvxs;
使用常数{
文件_1=>“1.csv”,
文件_2=>“2.csv”,
文件_3=>“3.csv”,
};
我的%hash;
#
#使用文件#1中的值加载哈希
#
打开我的$file1_fh,“final.csv”)或死亡“无法打开进行写入\n”;
$,=“\n”;
打印出“片段\t1\t2\t3\n”;
打印输出(排序键%hash);
收尾;
我将执行以下操作:
my@files=(“file1”、“file2”、“file3”)代码>
%hash = (
"abc" => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
"pqr" => [ {fileIdx => 0, id => 12, line => 2, ind => "B"},
{fileIdx => 1, id => 15, line => 2, ind => "G"}]
)
为了解决这个问题,您需要改变您的数据结构,因为您需要存储有关文件、片段和片段ID的信息。由于ID随着文件的变化而变化,因此您需要存储与特定文件对应的ID 前面的脚本使用了一个简单的方法来跟踪哪些文件包含哪些片段。这个脚本需要稍微复杂一点,因为我们从文件中提取更多数据并以不同的方式输出:
use strict;
use warnings;
# put our files in an array
my @files = ('1.csv', '2.csv', '3.csv');
my %hash;
#
# Load the Hash with value from File #1
#
# since we're doing the same parsing to each file,
# let's save ourselves some typing and run the same code
# on each file
for my $f (@files) {
open my $fh, "<", $f or die "Could not open $f: $!";
while (my $val = <$fh>) {
# skip the first line
next if $. == 1;
chomp $val;
# split the line by the tabs
my ($id, $ix, $frag) = split(/\t/, $val);
# store the data in a hash of hashes of hashes
# keys are the fragment, then the file name
# I've stored the index and the id, but obviously
# you can alter this if you have files of a different format
# and/or want to save different data.
$hash{$frag}{$f} = { ix => $ix, id => $id };
}
}
好的,回到脚本:
#set up the output file
my $out;
open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
# print out a header row
# map applies the code within the brackets to every element of @files,
# so in this case, we're printing out "ID_<array element> \t <array element >"
# for every file in our list
# the join joins together items following it using the string "\t"
print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";
# now, output our data
# $frag is the fragment
for my $frag ( sort keys %hash ) {
print { $out } "$frag\t";
# check which files it appears in
foreach (@files) {
# if it exists in that file, print out the ID and '1'
if ( $hash{$frag}{$_} ) {
print { $out } $hash{$frag}{$_}{id} . "\t1\t";
}
else {
# print nothing in the ID column, and 0 in the file column
print { $out } "\t0\t";
}
}
print $out "\n";
}
close $out;
#设置输出文件
我的美元用完了;
打开($out,“>final.csv”)或死亡“无法打开final.csv进行写入\n”;
#打印出标题行
#map将括号内的代码应用于@files的每个元素,
#所以在这种情况下,我们打印出“ID\t”
#对于我们列表中的每个文件
#联接使用字符串“\t”将其后面的项联接在一起
打印{$out}join(“\t”,“Fragment”,映射{“ID\u$\ ut$\uz}@files)。“\n”;
#现在,输出我们的数据
#$frag是碎片
对于我的$frag(排序键%hash){
打印{$out}“$frag\t”;
#检查它出现在哪些文件中
foreach(@files){
#如果该文件中存在,请打印出ID和“1”
if($hash{$frag}{$}){
打印{$out}$hash{$frag}{${{id}.\t1\t”;
}
否则{
#在ID列中不打印任何内容,在文件列中不打印0
打印{$out}“\t0\t”;
}
}
打印$out“\n”;
}
收尾美元;
它工作正常,但我无法获取片段标题,标题以id\u file1、file1、id\u file2、file2开头。此外,id_文件列为空(id无法打印)。它只打印片段,以及1或0。表示输出类似于id_file1(打印片段列)、file1(空)、id_file2(1或0)、file2(空),没有标题(1或0)。我编辑了代码,因为我忘了添加“片段”列标题--请确保您拥有最新的版本。如果似乎缺少信息,首先检查您想要的数据是否存在于%hash
中——使用data::Dump
或data::Dump
打印数据结构并检查您是否获得了ID。如果您阅读了我的答案,您将看到带有$x和$y的代码只是作为示例,而不仅仅是盲目地粘贴到脚本中。请阅读我写的评论,您应该能够解决剩余的任何问题。很抱歉,我收到了延迟消息。知道了。非常感谢您阅读了文件1
、文件3
和文件3
的内容,但显示了文件1
和文件2
的内容。您的计数显示id
,并且仅显示文件1
和文件2
的计数。真实情况是什么?实际上这个脚本是用于2个文件的,我忘了编辑它。
use warnings;
use feature qw(say);
use autodie;
use Text::CSV_XS;
use constant {
FILE_1 => "1.csv",
FILE_2 => "2.csv",
FILE_3 => "3.csv",
};
my %hash;
#
# Load the Hash with value from File #1
#
open my $file1_fh, "<", FILE_1;
while ( my $value = <$file1_fh> ) {
chomp $value;
$hash{$value}++;
}
close $file1_fh;
#
# Add File #2 to the Hash
#
open my $file2_fh, "<", FILE_2;
while ( my $value = <$file2_fh> ) {
chomp $value;
$hash{$value} += 10; # if the key already exists, the value will now be 11
# if it did not exist, the value will be 10
}
close $file2_fh;
open my $file3_fh, "<", FILE_3;
while ( my $value = <$file3_fh> ) {
chomp $value;
$hash{$value} += 100;
}
close $file3_fh;
for my $k ( sort keys %hash )
{ if ($hash{$k} == 1) { # only in file 1
say "$k\t0\t0\t1";
}
elsif ($hash{$k} == 10) { # only in file 2
say "$k\t0\t1\t0";
}
elsif ($hash{$k} == 100) { # only in file 2
say "$k\t1\t0\t0";
}
else { # in both file 1 and file 2
say "$k\t1\t1\t1";
}
}
open (OUT, ">final.csv") or die "Cannot open OUT for writing \n";
$, = " \n";
print OUT "fragment\t1\t2\t3 \n";
print OUT (sort keys %hash);
close OUT;
%hash = (
"abc" => [ {fileIdx => 0, id => 11, line => 1, ind => "A"} ] ,
"pqr" => [ {fileIdx => 0, id => 12, line => 2, ind => "B"},
{fileIdx => 1, id => 15, line => 2, ind => "G"}]
)
use strict;
use warnings;
# put our files in an array
my @files = ('1.csv', '2.csv', '3.csv');
my %hash;
#
# Load the Hash with value from File #1
#
# since we're doing the same parsing to each file,
# let's save ourselves some typing and run the same code
# on each file
for my $f (@files) {
open my $fh, "<", $f or die "Could not open $f: $!";
while (my $val = <$fh>) {
# skip the first line
next if $. == 1;
chomp $val;
# split the line by the tabs
my ($id, $ix, $frag) = split(/\t/, $val);
# store the data in a hash of hashes of hashes
# keys are the fragment, then the file name
# I've stored the index and the id, but obviously
# you can alter this if you have files of a different format
# and/or want to save different data.
$hash{$frag}{$f} = { ix => $ix, id => $id };
}
}
# get the ID of the fragment $x in 2.csv
say $hash{$x}{"2.csv"}{id};
# check if fragment $y exists in 3.csv, and print the index if so
if ( $hash{$y}{"3.csv"} ) {
say $hash{$y}{"3.csv"}{ix};
}
#set up the output file
my $out;
open ($out, ">final.csv") or die "Cannot open final.csv for writing \n";
# print out a header row
# map applies the code within the brackets to every element of @files,
# so in this case, we're printing out "ID_<array element> \t <array element >"
# for every file in our list
# the join joins together items following it using the string "\t"
print { $out } join("\t", "Fragment", map { "ID_$_\t$_" } @files) . "\n";
# now, output our data
# $frag is the fragment
for my $frag ( sort keys %hash ) {
print { $out } "$frag\t";
# check which files it appears in
foreach (@files) {
# if it exists in that file, print out the ID and '1'
if ( $hash{$frag}{$_} ) {
print { $out } $hash{$frag}{$_}{id} . "\t1\t";
}
else {
# print nothing in the ID column, and 0 in the file column
print { $out } "\t0\t";
}
}
print $out "\n";
}
close $out;