如何将0/1编码值与同一文件中提供的键匹配，并在bash中重写为一行（而不是列表）_Bash_Unix_Sed

如何将0/1编码值与同一文件中提供的键匹配，并在bash中重写为一行（而不是列表）

bash unix sed

如何将0/1编码值与同一文件中提供的键匹配，并在bash中重写为一行（而不是列表）,bash,unix,sed,Bash,Unix,Sed,我有一个输入文件，超过1000000行，看起来像这样： G A 0|0:2,0:2:3:0,3,32 G A 0|1:2,0:2:3:0,3,32 G C 1|1:0,1:1:3:32,3,0 C G 1|1:0,1:1:3:32,3,0 A G 1|0:0,1:1:3:39,3,0 HAP0 GGCGG HAP1 GACGA 就我而言，第三个字段中第一个：之后的所有内容都

我有一个输入文件，超过1000000行，看起来像这样：

G       A       0|0:2,0:2:3:0,3,32
G       A       0|1:2,0:2:3:0,3,32
G       C       1|1:0,1:1:3:32,3,0
C       G       1|1:0,1:1:3:32,3,0
A       G       1|0:0,1:1:3:39,3,0

HAP0 GGCGG
HAP1 GACGA

就我而言，第三个字段中第一个

：

之后的所有内容都是不相关的（但我将其保留在该字段中，因为它会影响代码）

第一个字段定义第三个字段中编码为

的值，第二个字段定义编码为

例如：

ga0 | 0

G | G

ga1 | 0

A | G

ga1 | 1

A | A

等等

我首先需要解码第三个字段，然后将其从一个垂直列表转换为一个水平值列表，其中

前面的值在一行，后面的值在第二行

因此，顶部的示例如下所示：

G       A       0|0:2,0:2:3:0,3,32
G       A       0|1:2,0:2:3:0,3,32
G       C       1|1:0,1:1:3:32,3,0
C       G       1|1:0,1:1:3:32,3,0
A       G       1|0:0,1:1:3:39,3,0

HAP0 GGCGG
HAP1 GACGA

我一直在bash工作，但欢迎提出任何其他建议。我有一个脚本来完成这项工作，但它的速度非常慢，而且冗长，我相信还有更好的方法

echo "HAP0 " > output.txt
echo "HAP1 " >> output.txt

while IFS=$'\t' read -a array; do
        ref=${array[0]}
        alt=${array[1]}
        data=${array[2]}

        IFS=$':' read -a code <<< $data
        IFS=$'|' read -a hap <<< ${code[0]}

        if [[ "${hap[0]}" -eq 0 ]]; then
                sed -i "1s/$/${ref}/" output.txt
        elif [[ "${hap[0]}" -eq 1 ]]; then
                sed -i "1s/$/${alt}/" output.txt
        fi

        if [[ "${hap[1]}" -eq 0 ]]; then
                sed -i "2s/$/${ref}/" output.txt
        elif [[ "${hap[1]}" -eq 1 ]]; then
                sed -i "2s/$/${alt}/" output.txt
        fi
done < input.txt

echo“HAP0”>output.txt
echo“HAP1”>>output.txt
而IFS=$'\t'读取一个数组；做
ref=${array[0]}
alt=${array[1]}
数据=${array[2]}
如果s=$：'read-a code而不是在子shell中运行sed，请使用参数扩展
#!/bin/bash
printf '%s ' HAP0 > tmp0
printf '%s ' HAP1 > tmp1


while read -a cols ; do
    indexes=${cols[2]}
    indexes=${indexes%%:*}
    idx0=${indexes%|*}
    idx1=${indexes#*|}
    printf '%s' ${cols[idx0]} >> tmp0
    printf '%s' ${cols[idx1]} >> tmp1
done < "$1"

cat tmp0
printf '\n'
cat tmp1
printf '\n'
rm tmp0 tmp1

#/bin/bash
printf“%s”HAP0>tmp0
printf“%s”HAP1>tmp1
读书的时候——一头驴；做
索引=${cols[2]}
索引=${index%%:*}
idx0=${index%|*}
idx1=${index#*|}
printf'%s'${cols[idx0]}>>tmp0
printf“%s”${cols[idx1]}>>tmp1
完成<“$1”
猫tmp0
printf'\n'
猫tmp1
printf'\n'
rmtmp0tmp1

脚本创建两个临时文件，一个包含第一行，第二个包含第二行
或者，使用Perl实现更快的解决方案：
#!/usr/bin/perl
use warnings;
use strict;

my @haps;
while (<>) {
    my @cols = split /[\s|:]+/, $_, 5;
    $haps[$_] .= $cols[ $cols[ $_ + 2 ] ] for 0, 1;
}
print "HAP$_ $haps[$_]\n" for 0, 1;

#/usr/bin/perl
使用警告；
严格使用；
我的@haps；
而（）{
my@cols=split/[\s |：：]+/，$，5；
$haps[$\uz].=$cols[$cols[$\uz+2]]表示0，1；
}
为0,1打印“HAP$\u$haps[$\uuu]\n”；
output.txt的作用是什么？显示最终的期望结果