比较两个文件,如果匹配向下移动的最后一个字段(awk)

比较两个文件,如果匹配向下移动的最后一个字段(awk),awk,merge,Awk,Merge,我有两个输入文件(以制表符分隔),如果只匹配第3和第4个字段将下移,我需要为$1&&$2查找它们之间的匹配: 输入: 文件1: 文件2: p1 323 lololo aaaa p1 555 papapp kkka p1 556 hooho sssa p1 557 jjjlo kkka p3 424 zzzzz llla p3 558 jjjjj ssss 输出: p1 323 lololo aaaa p1 555 p1

我有两个输入文件(以制表符分隔),如果只匹配第3和第4个字段将下移,我需要为$1&&$2查找它们之间的匹配:

输入: 文件1:

文件2:

p1  323 lololo  aaaa    
p1  555 papapp  kkka    
p1  556 hooho   sssa    
p1  557 jjjlo   kkka    
p3  424 zzzzz   llla    
p3  558 jjjjj   ssss
输出:

p1 323  lololo aaaa
p1 555
p1 556  papaapp kkka
p1 557   
p3 424  hooho   sssa
p3 558      
        jjjlo   kkka  
等等


谢谢你

以下几点应该行得通:

awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:

NR == FNR {                      # while processing the first file (file1)
  to_shift[$1,$2] = 1            # remember which lines to shift
  next                           # and do nothing else
}
{                                # afterwards (processing file2):
  queue[++w] = $3 OFS $4         # queue the next payload fields
}
to_shift[$1, $2] {               # If this is a shift line
  print $1, $2                   # print only the first two fields
  next                           # and do nothing else
}
{                                # otherwise, print the first two fields and
  print $1, $2, queue[++r]       # the next queued payload
}
END {                            # In the end:
  while(r != w) {                # print out what remains in the queue, i.e.
    print OFS OFS queue[++r]     # all that was shifted out at the bottom
  }
}
我猜想对于格式化,您可能希望使用
\t
作为输出字段分隔符,在这种情况下,您只需将
-v of s='\t'
传递到
awk

awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2

如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。

沿着这些线应该可以工作:

awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:

NR == FNR {                      # while processing the first file (file1)
  to_shift[$1,$2] = 1            # remember which lines to shift
  next                           # and do nothing else
}
{                                # afterwards (processing file2):
  queue[++w] = $3 OFS $4         # queue the next payload fields
}
to_shift[$1, $2] {               # If this is a shift line
  print $1, $2                   # print only the first two fields
  next                           # and do nothing else
}
{                                # otherwise, print the first two fields and
  print $1, $2, queue[++r]       # the next queued payload
}
END {                            # In the end:
  while(r != w) {                # print out what remains in the queue, i.e.
    print OFS OFS queue[++r]     # all that was shifted out at the bottom
  }
}
我猜想对于格式化,您可能希望使用
\t
作为输出字段分隔符,在这种情况下,您只需将
-v of s='\t'
传递到
awk

awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2

如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。

沿着这些线应该可以工作:

awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:

NR == FNR {                      # while processing the first file (file1)
  to_shift[$1,$2] = 1            # remember which lines to shift
  next                           # and do nothing else
}
{                                # afterwards (processing file2):
  queue[++w] = $3 OFS $4         # queue the next payload fields
}
to_shift[$1, $2] {               # If this is a shift line
  print $1, $2                   # print only the first two fields
  next                           # and do nothing else
}
{                                # otherwise, print the first two fields and
  print $1, $2, queue[++r]       # the next queued payload
}
END {                            # In the end:
  while(r != w) {                # print out what remains in the queue, i.e.
    print OFS OFS queue[++r]     # all that was shifted out at the bottom
  }
}
我猜想对于格式化,您可能希望使用
\t
作为输出字段分隔符,在这种情况下,您只需将
-v of s='\t'
传递到
awk

awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2

如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。

沿着这些线应该可以工作:

awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
即:

NR == FNR {                      # while processing the first file (file1)
  to_shift[$1,$2] = 1            # remember which lines to shift
  next                           # and do nothing else
}
{                                # afterwards (processing file2):
  queue[++w] = $3 OFS $4         # queue the next payload fields
}
to_shift[$1, $2] {               # If this is a shift line
  print $1, $2                   # print only the first two fields
  next                           # and do nothing else
}
{                                # otherwise, print the first two fields and
  print $1, $2, queue[++r]       # the next queued payload
}
END {                            # In the end:
  while(r != w) {                # print out what remains in the queue, i.e.
    print OFS OFS queue[++r]     # all that was shifted out at the bottom
  }
}
我猜想对于格式化,您可能希望使用
\t
作为输出字段分隔符,在这种情况下,您只需将
-v of s='\t'
传递到
awk

awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2

如果输入是以制表符分隔的,并且字段可以包含空格,也可以传递
-F'\t'
,使输入字段分隔符也成为制表符。

Wintermute,谢谢你的精彩脚本和评论,我很感激。Wintermute,谢谢你的精彩脚本和评论,我很感激。Wintermute,谢谢你的精彩剧本和评论,我很感激。Wintermute,谢谢你的精彩剧本和评论,我很感激。