Bash 打印在某些列中找到的同一个单词,而在UNIX中,下面是空白字段
这是一个简短的列表输入.tsvBash 打印在某些列中找到的同一个单词,而在UNIX中,下面是空白字段,bash,unix,awk,Bash,Unix,Awk,这是一个简短的列表输入.tsv rs928302 YES TMPRSS3 rf G V 53 NM_001256317.1 NP_001243246.1 rf G V 53 NM_024022.2 NP_076927.1 rf G
rs928302 YES TMPRSS3 rf G V 53 NM_001256317.1 NP_001243246.1
rf G V 53 NM_024022.2 NP_076927.1
rf G V 53 NM_032405.1 NP_115781.1
rs1046210 YES BACE2 rf C D 364 NM_012105.4 NP_036237.2
rf C D 364 NM_138992.2 NP_620477.1
rf C D 269 XM_017028314.1 XP_016883803.1
rs1064579 YES IFNGR2 rf T V 272 NM_001329128.1 NP_001316057.1
rf T V 253 NM_005534.3 NP_005525.2
rf T V 272 XM_005260969.2 XP_005261026.1
rf T V 278 XM_011529553.1 XP_011527855.1
rf T V 255 XM_011529554.2 XP_011527856.1
我想在空白字段中打印与顶部相同的单词,并应用于第一、第二和第三列,直到文件结束。当出现一个不同的单词时,下面的打印应该是这个新单词等等。因此,输出应为:
rs928302 YES TMPRSS3 rf G V 53 NM_001256317.1 NP_001243246.1
rs928302 YES TMPRSS3 rf G V 53 NM_024022.2 NP_076927.1
rs928302 YES TMPRSS3 rf G V 53 NM_032405.1 NP_115781.1
rs1046210 YES BACE2 rf C D 364 NM_012105.4 NP_036237.2
rs1046210 YES BACE2 rf C D 364 NM_138992.2 NP_620477.1
rs1046210 YES BACE2 rf C D 269 XM_017028314.1 XP_016883803.1
rs1064579 YES IFNGR2 rf T V 272 NM_001329128.1 NP_001316057.1
rs1064579 YES IFNGR2 rf T V 253 NM_005534.3 NP_005525.2
rs1064579 YES IFNGR2 rf T V 272 XM_005260969.2 XP_005261026.1
rs1064579 YES IFNGR2 rf T V 278 XM_011529553.1 XP_011527855.1
rs1064579 YES IFNGR2 rf T V 255 XM_011529554.2 XP_011527856.1
如何在Unix环境中实现这一点?提前感谢。awk'
awk '
BEGIN { FS=OFS="\t" }
{
for (i=1; i<=3; i++) {
if ($i == "") {
$i = p[i]
}
else {
p[i] = $i
}
}
print
}
' file
开始{FS=OFS=“\t”}
{
对于(i=1;iawk'
开始{FS=OFS=“\t”}
{
对于(i=1;iawk溶液:
awk 'NF==9{ f1=$1; f2=$2; f3=$3 }
NF==6{ sub(/^[[:space:]]+/,"",$0);
$0=f1 OFS f2 OFS f3 OFS $0 }1' OFS='\t' file
输出:
rs928302 YES TMPRSS3 rf G V 53 NM_001256317.1 NP_001243246.1
rs928302 YES TMPRSS3 rf G V 53 NM_024022.2 NP_076927.1
rs928302 YES TMPRSS3 rf G V 53 NM_032405.1 NP_115781.1
rs1046210 YES BACE2 rf C D 364 NM_012105.4 NP_036237.2
rs1046210 YES BACE2 rf C D 364 NM_138992.2 NP_620477.1
rs1046210 YES BACE2 rf C D 269 XM_017028314.1 XP_016883803.1
rs1064579 YES IFNGR2 rf T V 272 NM_001329128.1 NP_001316057.1
rs1064579 YES IFNGR2 rf T V 253 NM_005534.3 NP_005525.2
rs1064579 YES IFNGR2 rf T V 272 XM_005260969.2 XP_005261026.1
rs1064579 YES IFNGR2 rf T V 278 XM_011529553.1 XP_011527855.1
rs1064579 YES IFNGR2 rf T V 255 XM_011529554.2 XP_011527856.1
awk解决方案:
awk 'NF==9{ f1=$1; f2=$2; f3=$3 }
NF==6{ sub(/^[[:space:]]+/,"",$0);
$0=f1 OFS f2 OFS f3 OFS $0 }1' OFS='\t' file
输出:
rs928302 YES TMPRSS3 rf G V 53 NM_001256317.1 NP_001243246.1
rs928302 YES TMPRSS3 rf G V 53 NM_024022.2 NP_076927.1
rs928302 YES TMPRSS3 rf G V 53 NM_032405.1 NP_115781.1
rs1046210 YES BACE2 rf C D 364 NM_012105.4 NP_036237.2
rs1046210 YES BACE2 rf C D 364 NM_138992.2 NP_620477.1
rs1046210 YES BACE2 rf C D 269 XM_017028314.1 XP_016883803.1
rs1064579 YES IFNGR2 rf T V 272 NM_001329128.1 NP_001316057.1
rs1064579 YES IFNGR2 rf T V 253 NM_005534.3 NP_005525.2
rs1064579 YES IFNGR2 rf T V 272 XM_005260969.2 XP_005261026.1
rs1064579 YES IFNGR2 rf T V 278 XM_011529553.1 XP_011527855.1
rs1064579 YES IFNGR2 rf T V 255 XM_011529554.2 XP_011527856.1