Perl 笛卡尔积

Perl 笛卡尔积,perl,awk,normalize,Perl,Awk,Normalize,我有一个大的选项卡分隔的两列文本文件,如下所示: ... "001R_FRG3G" "81941549; 47060116; 49237298" "002L_FRG3G" "49237299; 47060117; 81941548" "002R_IIV3" "106073503; 123808694; 109287880" ... 正如您所看到的,第二列不包含原子值。这就是为什么我想“正常化”此文件,使其具有以下内容: ... "001R_FRG3G" "81941549"

我有一个大的选项卡分隔的两列文本文件,如下所示:

...
"001R_FRG3G"    "81941549; 47060116; 49237298"
"002L_FRG3G"    "49237299; 47060117; 81941548"
"002R_IIV3" "106073503; 123808694; 109287880"
...
正如您所看到的,第二列不包含原子值。这就是为什么我想“正常化”此文件,使其具有以下内容:

...
"001R_FRG3G"    "81941549"
"001R_FRG3G"    "47060116"
"001R_FRG3G"    "49237298"
"002L_FRG3G"    "49237299"
"002L_FRG3G"    "47060117"
"002L_FRG3G"    "81941548"
"002R_IIV3" "106073503"
"002R_IIV3" "123808694"
"002R_IIV3" "109287880"
...
有人知道如何有效地做吗?

awk'{for(i=2;i这可能对您有用(GNU-awk):

awk '{for (i=2; i<=NF; i++) {gsub(/[";]/, "", $i); printf "%s%s\"%s\"", $1, OFS, $i; printf "%s", "\n"}}' inputfile
或者,它不是awk,但它优雅地解决了问题

sed -i ':a;s/\(\(.*\s"\).*\);\s*/\1"\n\2/;ta' file
"001R_FRG3G"    "81941549"
"001R_FRG3G"    "47060116"
"001R_FRG3G"    "49237298"
"002L_FRG3G"    "49237299"
"002L_FRG3G"    "47060117"
"002L_FRG3G"    "81941548"
"002R_IIV3" "106073503"
"002R_IIV3" "123808694"
"002R_IIV3" "109287880"
Perl:


在我的例子中,这是所有提供的解决方案中速度最快的,并且是(只有)两个实际工作的解决方案中的一个…@mnowotka:您在问题中显示的所需输出包括每个数据项周围的引号。我的脚本会删除引号并将其添加回,以按照我在回答中所述的要求提供输出。它究竟如何不适用于您?
sed -i ':a;s/\(\(.*\s"\).*\);\s*/\1"\n\2/;ta' file
"001R_FRG3G"    "81941549"
"001R_FRG3G"    "47060116"
"001R_FRG3G"    "49237298"
"002L_FRG3G"    "49237299"
"002L_FRG3G"    "47060117"
"002L_FRG3G"    "81941548"
"002R_IIV3" "106073503"
"002R_IIV3" "123808694"
"002R_IIV3" "109287880"
perl -lne '
s/[";]//g;
($a, @b) = split;
print qq("$a" "$_") for @b;
' FILE