比较两个文件,如果匹配向下移动的最后一个字段(awk)
我有两个输入文件(以制表符分隔),如果只匹配第3和第4个字段将下移,我需要为$1&&$2查找它们之间的匹配: 输入: 文件1: 文件2:比较两个文件,如果匹配向下移动的最后一个字段(awk),awk,merge,Awk,Merge,我有两个输入文件(以制表符分隔),如果只匹配第3和第4个字段将下移,我需要为$1&&$2查找它们之间的匹配: 输入: 文件1: 文件2: p1 323 lololo aaaa p1 555 papapp kkka p1 556 hooho sssa p1 557 jjjlo kkka p3 424 zzzzz llla p3 558 jjjjj ssss 输出: p1 323 lololo aaaa p1 555 p1
p1 323 lololo aaaa
p1 555 papapp kkka
p1 556 hooho sssa
p1 557 jjjlo kkka
p3 424 zzzzz llla
p3 558 jjjjj ssss
输出:
p1 323 lololo aaaa
p1 555
p1 556 papaapp kkka
p1 557
p3 424 hooho sssa
p3 558
jjjlo kkka
等等
谢谢你以下几点应该行得通:
awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:
NR == FNR { # while processing the first file (file1)
to_shift[$1,$2] = 1 # remember which lines to shift
next # and do nothing else
}
{ # afterwards (processing file2):
queue[++w] = $3 OFS $4 # queue the next payload fields
}
to_shift[$1, $2] { # If this is a shift line
print $1, $2 # print only the first two fields
next # and do nothing else
}
{ # otherwise, print the first two fields and
print $1, $2, queue[++r] # the next queued payload
}
END { # In the end:
while(r != w) { # print out what remains in the queue, i.e.
print OFS OFS queue[++r] # all that was shifted out at the bottom
}
}
我猜想对于格式化,您可能希望使用\t
作为输出字段分隔符,在这种情况下,您只需将-v of s='\t'
传递到awk
:
awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。沿着这些线应该可以工作:
awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:
NR == FNR { # while processing the first file (file1)
to_shift[$1,$2] = 1 # remember which lines to shift
next # and do nothing else
}
{ # afterwards (processing file2):
queue[++w] = $3 OFS $4 # queue the next payload fields
}
to_shift[$1, $2] { # If this is a shift line
print $1, $2 # print only the first two fields
next # and do nothing else
}
{ # otherwise, print the first two fields and
print $1, $2, queue[++r] # the next queued payload
}
END { # In the end:
while(r != w) { # print out what remains in the queue, i.e.
print OFS OFS queue[++r] # all that was shifted out at the bottom
}
}
我猜想对于格式化,您可能希望使用\t
作为输出字段分隔符,在这种情况下,您只需将-v of s='\t'
传递到awk
:
awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。沿着这些线应该可以工作:
awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:
NR == FNR { # while processing the first file (file1)
to_shift[$1,$2] = 1 # remember which lines to shift
next # and do nothing else
}
{ # afterwards (processing file2):
queue[++w] = $3 OFS $4 # queue the next payload fields
}
to_shift[$1, $2] { # If this is a shift line
print $1, $2 # print only the first two fields
next # and do nothing else
}
{ # otherwise, print the first two fields and
print $1, $2, queue[++r] # the next queued payload
}
END { # In the end:
while(r != w) { # print out what remains in the queue, i.e.
print OFS OFS queue[++r] # all that was shifted out at the bottom
}
}
我猜想对于格式化,您可能希望使用\t
作为输出字段分隔符,在这种情况下,您只需将-v of s='\t'
传递到awk
:
awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。沿着这些线应该可以工作:
awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:
NR == FNR { # while processing the first file (file1)
to_shift[$1,$2] = 1 # remember which lines to shift
next # and do nothing else
}
{ # afterwards (processing file2):
queue[++w] = $3 OFS $4 # queue the next payload fields
}
to_shift[$1, $2] { # If this is a shift line
print $1, $2 # print only the first two fields
next # and do nothing else
}
{ # otherwise, print the first two fields and
print $1, $2, queue[++r] # the next queued payload
}
END { # In the end:
while(r != w) { # print out what remains in the queue, i.e.
print OFS OFS queue[++r] # all that was shifted out at the bottom
}
}
我猜想对于格式化,您可能希望使用\t
作为输出字段分隔符,在这种情况下,您只需将-v of s='\t'
传递到awk
:
awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。Wintermute,谢谢你的精彩脚本和评论,我很感激。Wintermute,谢谢你的精彩脚本和评论,我很感激。Wintermute,谢谢你的精彩剧本和评论,我很感激。Wintermute,谢谢你的精彩剧本和评论,我很感激。