Python 从字典中提取数据_Python_Perl_Bash_Shell

Python 从字典中提取数据

python perl bash shell

Python 从字典中提取数据,python,perl,bash,shell,Python,Perl,Bash,Shell,我有两个制表符分隔的文件，文件1包含标识符，文件2包含与这些标识符相关的值（或者说它是一个非常大的字典）文件1 Ronny Rubby Suzie Paul 龙尼鲁比苏西保罗文件1只有一列文件2 Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai 12 13 14 12 11 11 12 23 30 0.34 0.65 1 4 56 23 12

我有两个制表符分隔的文件，文件1包含标识符，文件2包含与这些标识符相关的值（或者说它是一个非常大的字典）

文件1

Ronny Rubby Suzie Paul 龙尼鲁比苏西保罗文件1只有一列

文件2

Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai 12 13 14 12 11 11 12 23 30 0.34 0.65 1 4 56 23 12 8.9 5.1 1 4 25 3 阿利斯塔·巴尔姆·凯西·保罗·罗尼·鲁比·苏西·汤姆·乌玛·瓦伊·扎伊 12 13 14 12 11 11 12 23 30 0.34 0.65 1 4 56 23 12 8.9 5.1 1 4 25 3 n文件2中存在的行数

我想要的是，如果文件1的标识符存在于文件2中，那么我应该在另一个以制表符分隔的文件中包含与之相关的所有值

大概是这样的：

Paul Ronny Rubby Suzie 12 11 11 12 23 12 8.9 5.1 保罗·罗尼·鲁比·苏西 12 11 11 12 23 12 8.9 5.1 提前谢谢。

注意

您的示例输出不正确，因为您有“Ruby”，但在您的file1示例中有“Rubby”Ruby=/=Rubby

kent$  awk 'NR==FNR{t[$0]++;next}
{if(FNR==1){
        for(i=1;i<=NF;i++)
                if($i in t){
                        v[i]++;
                        printf $i"\t";
                }
        print "";
        }else{
        for(x in v)
                printf $x"\t"
        print "";
}

}' file1 file2

$awk'文件名~1{a[$0]；next}；FNR==1{for（i=1；i文件名~1{a[$0]；next}
>FNR==1{for（i=1；i{for（b中的j）printf（“%s\t”，$j）；print”“}
>'文件{1,2}.txt
保罗罗尼苏西
12      11      12
23      12      5.1

您只能使用bash来执行此操作：

FIELDS=`head -1 f2.txt | tr "\t" "\n" | nl -ba | grep -f f1.txt | cut -f1 | tr -d " " | tr "\n" ","`; FIELDS=${FIELDS/%,/}
cut -f$FIELDS f2.txt 
Paul    Ronny   Ruby    Suzie
12  11  11  12
23  12  8.9 5.1

Python中执行流中工作的示例（即：在开始输出之前不需要加载完整文件）：

输出：

$ python test.py 
Ronny   Ruby    Suzie   Paul
11      11      12      12
12      8.9     5.1     23

Perl解决方案：

#!/usr/bin/perl
use warnings;
use strict;

open my $KEYS, '<', 'file1' or die $!;
my @keys = <$KEYS>;
close $KEYS;
chomp @keys;
my %is_key;
undef @is_key{@keys};

open my $TAB, '<', 'file2' or die $!;
$_ = <$TAB>;
my ($i, @columns);
for (split) {
    push @columns, $i if exists $is_key{$_};
    $i++;
}
do {{
    my @values = split;
    print join("\t", @values[@columns]), "\n";
}} while <$TAB>;

！/usr/bin/perl
使用警告；
严格使用；
打开我的$KEYS，“类似的东西可能会起作用，这取决于你想要什么
use strict;
use warnings;

my %names;
open ( my $nh, '<', $name_file_path ) or die "Could not open '$name_file_path'!";
while ( <$nh> ) { 
    m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1; 
}
close $nh;

my $coln = -1;
open ( my $dh, '<', $data_file_path ) or die "Could not open '$data_file_path'!";

my ( @name_list, @col_list )
my @names = split /\t/, <$dh>;
foreach my $name ( 0..$#names ) {
    next unless exists $names{ $names[ $name ] };
    push @name_list, $name;
    push @col_list, $coln;
}
local $" = "\t";
print "@name_list\n";
print "@{[ split /\t/ ]}[ @col_list ]\n"  while <$dh>;
close $dh;

使用严格；
使用警告；
我的%姓名；
打开（my$nh，“这可能适合您：
 sed '1{s/\t/\n/gp};d' file2 |
 nl |
 grep -f file1 |
 cut -f1 |
 paste -sd, |
 sed 's/ //g;s,.*,cut -f& /tmp/b,' |
 sh

说明：
透视列名
给列名编号
将列名与输入文件匹配
删除保留列号的列名
以，
分隔的列编号为轴
从逗号分隔的列编号列表中生成cut
命令
对数据文件运行cut
命令
到目前为止，你写了什么代码？你说的一本很大的字典是什么意思？@Duncan:我不知道如何将列值与行值匹配，然后提取其coulmn中的值。@M42字典总是很大：）@安杰洛：是的，但是1GB、100GB还有多少？@Angelo没问题。但我不会为此改变我的答案。
$ python test.py 
Ronny   Ruby    Suzie   Paul
11      11      12      12
12      8.9     5.1     23

#!/usr/bin/perl
use warnings;
use strict;

open my $KEYS, '<', 'file1' or die $!;
my @keys = <$KEYS>;
close $KEYS;
chomp @keys;
my %is_key;
undef @is_key{@keys};

open my $TAB, '<', 'file2' or die $!;
$_ = <$TAB>;
my ($i, @columns);
for (split) {
    push @columns, $i if exists $is_key{$_};
    $i++;
}
do {{
    my @values = split;
    print join("\t", @values[@columns]), "\n";
}} while <$TAB>;

use strict;
use warnings;

my %names;
open ( my $nh, '<', $name_file_path ) or die "Could not open '$name_file_path'!";
while ( <$nh> ) { 
    m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1; 
}
close $nh;

my $coln = -1;
open ( my $dh, '<', $data_file_path ) or die "Could not open '$data_file_path'!";

my ( @name_list, @col_list )
my @names = split /\t/, <$dh>;
foreach my $name ( 0..$#names ) {
    next unless exists $names{ $names[ $name ] };
    push @name_list, $name;
    push @col_list, $coln;
}
local $" = "\t";
print "@name_list\n";
print "@{[ split /\t/ ]}[ @col_list ]\n"  while <$dh>;
close $dh;

 sed '1{s/\t/\n/gp};d' file2 |
 nl |
 grep -f file1 |
 cut -f1 |
 paste -sd, |
 sed 's/ //g;s,.*,cut -f& /tmp/b,' |
 sh