在unix中,将多行合并为一行,每条记录由一个空行分隔
我有一个文本文件,其中每条记录都以no和name开头,以空行结尾。我希望将每条记录作为逗号分隔的值放在一行中。我尝试了以下代码,其代码文件和文本文件链接附在下面: unix命令:要运行代码,请执行以下操作:在unix中,将多行合并为一行,每条记录由一个空行分隔,unix,awk,sed,Unix,Awk,Sed,我有一个文本文件,其中每条记录都以no和name开头,以空行结尾。我希望将每条记录作为逗号分隔的值放在一行中。我尝试了以下代码,其代码文件和文本文件链接附在下面: unix命令:要运行代码,请执行以下操作: gawk -f sark.awk biosample.txt 然后运行: sed 's/,,/\n/g' <biosample.txt > out.txt 具有要从每条记录中选取的每个标题的值,这些记录由新行分隔 谢谢这里有一个使用awk的简单实现: BEGIN { p
gawk -f sark.awk biosample.txt
然后运行:
sed 's/,,/\n/g' <biosample.txt > out.txt
具有要从每条记录中选取的每个标题的值,这些记录由新行分隔
谢谢这里有一个使用
awk
的简单实现:
BEGIN { print "record name,Identifiers,Organism,strain,isolate,serovar,"\
"isolation source,collected by,collection date,"\
"geographic location,host,host disease,Accession,ID,"\
"potential_contaminant,sample type,Description"
RS="\r\n"
ORS=""
}
sub(/^[0-9]*: /,"") { r[1] = $0; next }
sub(/^Identifiers: /,""){ r[2] = $0; next }
sub(/^Organism: /,"") { r[3] = $0; next }
/^ / { split($0, a, "=") }
/^ *\/strain=/ { r[4] = a[2] }
/^ *\/isolate=/ { r[5] = a[2] }
/^ *\/serovar=/ { r[6] = a[2] }
/^ *\/isolation source=/{ r[7] = a[2] }
/^ *\/collected by=/ { r[8] = a[2] }
/^ *\/collection date=/ { r[9] = a[2] }
/^ *\/geographic locati/{ r[10] = a[2] }
/^ *\/host=/ { r[11] = a[2] }
/^ *\/host disease=/ { r[12] = a[2] }
/^Accession:/ { r[13] = $2; r[14] = $4 }
/^ *\/potential_contami/{ r[15] = a[2] }
/^ *\/sample type=/ { r[16] = a[2] }
/^Description:/ { getline; r[17] = $0 }
/^$/ { if (r[1]) { for (i = 1; i < 17; ++i) print r[i]","
print r[i]"\n"
delete r
}
}
BEGIN{print“记录名称、标识符、生物体、菌株、分离物、血清型,”\
隔离源,收集人,收集日期\
地理位置、宿主、宿主疾病、加入、ID、\
潜在污染物、样品类型、说明
RS=“\r\n”
ORS=“”
}
sub(/^[0-9]*:/,“”){r[1]=$0;next}
子(/^标识符:/,“”){r[2]=0;下一个}
sub(/^生物体:/,“”){r[3]=0;next}
/^/{split($0,a,“=”)}
/^*\/strain=/{r[4]=a[2]}
/^*\/isolate=/{r[5]=a[2]}
/^*\/serovar=/{r[6]=a[2]}
/^*\/隔离源=/{r[7]=a[2]}
/^*\/collectedby=/{r[8]=a[2]}
/^*\/collection date=/{r[9]=a[2]}
/^*\/geographic locati/{r[10]=a[2]}
/^*\/host=/{r[11]=a[2]}
/^*\/宿主疾病=/{r[12]=a[2]}
/^加入:/{r[13]=$2;r[14]=$4}
/^*\/potential\u contami/{r[15]=a[2]}
/^*\/sample type=/{r[16]=a[2]}
/^描述:/{getline;r[17]=$0}
/^$/{if(r[1]){for(i=1;i<17;++i)打印r[i],”
打印r[i]“\n”
删除r
}
}
您能发布输入文件的摘录和输出文件的外观吗?请回答您的问题,并在问题中直接显示代码和一个小的输入文件,格式为代码块。“不一致/混乱/混乱”不是一个充分的问题描述。显示实际输出和预期输出,并在必要时解释实际输出错误的原因。@user3506020-缩进sark.awk
是荒谬的。怎么可能是gawk
输入和sed
输入,后者必须是gawk
输出,与biosample.txt同名吗?我已经在帖子中附上了我的文件和代码,可以下载。非常感谢,效果非常好
BEGIN { print "record name,Identifiers,Organism,strain,isolate,serovar,"\
"isolation source,collected by,collection date,"\
"geographic location,host,host disease,Accession,ID,"\
"potential_contaminant,sample type,Description"
RS="\r\n"
ORS=""
}
sub(/^[0-9]*: /,"") { r[1] = $0; next }
sub(/^Identifiers: /,""){ r[2] = $0; next }
sub(/^Organism: /,"") { r[3] = $0; next }
/^ / { split($0, a, "=") }
/^ *\/strain=/ { r[4] = a[2] }
/^ *\/isolate=/ { r[5] = a[2] }
/^ *\/serovar=/ { r[6] = a[2] }
/^ *\/isolation source=/{ r[7] = a[2] }
/^ *\/collected by=/ { r[8] = a[2] }
/^ *\/collection date=/ { r[9] = a[2] }
/^ *\/geographic locati/{ r[10] = a[2] }
/^ *\/host=/ { r[11] = a[2] }
/^ *\/host disease=/ { r[12] = a[2] }
/^Accession:/ { r[13] = $2; r[14] = $4 }
/^ *\/potential_contami/{ r[15] = a[2] }
/^ *\/sample type=/ { r[16] = a[2] }
/^Description:/ { getline; r[17] = $0 }
/^$/ { if (r[1]) { for (i = 1; i < 17; ++i) print r[i]","
print r[i]"\n"
delete r
}
}