awk将所选行转置为列

awk将所选行转置为列,awk,Awk,我以这种格式创建了一个文本文件: [Term] id: HP:0000006 name: Autosomal dominant inheritance alt_id: HP:0001415 alt_id: HP:0001447 alt_id: HP:0001448 alt_id: HP:0001451 alt_id: HP:0001455 alt_id: HP:0001456 alt_id: HP:0001463 def: "A mode of inheritance that is obser

我以这种格式创建了一个文本文件:

[Term]
id: HP:0000006
name: Autosomal dominant inheritance
alt_id: HP:0001415
alt_id: HP:0001447
alt_id: HP:0001448
alt_id: HP:0001451
alt_id: HP:0001455
alt_id: HP:0001456
alt_id: HP:0001463
def: "A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the mutant allele." [HPO:curators]
synonym: "Autosomal dominant" EXACT []
synonym: "Autosomal dominant form" RELATED [HPO:skoehler]
synonym: "Autosomal dominant type" RELATED [HPO:skoehler]
xref: SNOMEDCT_US:263681008
xref: UMLS:C0443147
is_a: HP:0000005 ! Mode of inheritance

[Term]
id: HP:0000007
name: Autosomal recessive inheritance
alt_id: HP:0001416
alt_id: HP:0001526
def: "A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in homozygotes. In the context of medical genetics, autosomal recessive disorders manifest in homozygotes (with two copies of the mutant allele) or compound heterozygotes (whereby each copy of a gene has a distinct mutant allele)." [HPO:curators]
synonym: "Autosomal recessive" EXACT []
synonym: "Autosomal recessive form" RELATED [HPO:skoehler]
synonym: "Autosomal recessive predisposition" RELATED []
xref: SNOMEDCT_US:258211005
xref: UMLS:C0441748
xref: UMLS:C4020899
is_a: HP:0000005 ! Mode of inheritance
我想从每个组中选择并转置两行(第一行以'name'开头,第二行以'def:'开头,用双引号括起来),从[Term]开始,以便生成下表:

column 1                         column 2
name                           | definition
Autosomal dominant inheritance | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the mutant allele.
以下是我的尝试:

gawk 'BEGIN{RS="[Term]"}{match($0, /^name:/, a) match($0, /^def:/, b) print a[1] , b[1]}' rows.txt > columns.txt

Awk
解决方案:

awk 'BEGIN{ 
         printf "%-35s | definition\n","name" 
     }
     /^name:/{ sub(/^name: /, ""); name = $0 }
     /^def:/{ 
         gsub(/^def: "|"[^"]+$/, "");
         printf "%-35s | %s\n", name, $0 
     }' file
输出:

name                                | definition
Autosomal dominant inheritance      | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the mutant allele.
Autosomal recessive inheritance     | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in homozygotes. In the context of medical genetics, autosomal recessive disorders manifest in homozygotes (with two copies of the mutant allele) or compound heterozygotes (whereby each copy of a gene has a distinct mutant allele).
可以轻松扩展上述内容,以相同的表格格式打印其他值:

$ cat tst.awk
BEGIN { OFS="\t| " }
{
    tag = val = $0
    sub(/:.*$/,"",tag)
    sub(/^[^:]+: *"?/,"",val)
    gsub(/".*$/,"",val)
    f[tag] = val
}
tag == "is_a" { print f["name"], f["id"], f["is_a"], f["def"] }

$ awk -f tst.awk file
Autosomal dominant inheritance  | HP:0000006    | HP:0000005 ! Mode of inheritance  | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the mutant allele.
Autosomal recessive inheritance | HP:0000007    | HP:0000005 ! Mode of inheritance  | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in homozygotes. In the context of medical genetics, autosomal recessive disorders manifest in homozygotes (with two copies of the mutant allele) or compound heterozygotes (whereby each copy of a gene has a distinct mutant allele).

我在想一些像gawk'BEGIN{RS=“[Term]”}{match($0,/^name:/,a)match($0,/^def:/,b)打印a[1],b[1]}文件_rows.txt>文件_columns.txt谢谢。如果我想用一个选项卡替换35个空格,我该怎么做?
print“name\t|definition”
$ cat tst.awk
BEGIN { OFS="\t| " }
{
    tag = val = $0
    sub(/:.*$/,"",tag)
    sub(/^[^:]+: *"?/,"",val)
    gsub(/".*$/,"",val)
    f[tag] = val
}
tag == "is_a" { print f["name"], f["id"], f["is_a"], f["def"] }

$ awk -f tst.awk file
Autosomal dominant inheritance  | HP:0000006    | HP:0000005 ! Mode of inheritance  | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the mutant allele.
Autosomal recessive inheritance | HP:0000007    | HP:0000005 ! Mode of inheritance  | A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in homozygotes. In the context of medical genetics, autosomal recessive disorders manifest in homozygotes (with two copies of the mutant allele) or compound heterozygotes (whereby each copy of a gene has a distinct mutant allele).