比较awk中的当前行和下一行_Awk

比较awk中的当前行和下一行

awk

比较awk中的当前行和下一行,awk,Awk,我想找到这样的模式：当前行中的第2列是“C”，下一行中的第2列是“G”。文件的第4列是“CG”。我想比较1到2，3到4，5到6，依此类推。然后打印一对当前行和下一行。 “C”可以出现在偶数行和奇数行中输入如下： chr1 C 10467 CHH CT 0.0 0 1 chr1 C 10469 CG CG 0.0 0 1 chr1 G 10470 CG CG 0.0 0 8 chr1 C 10471 CG CG 0

我想找到这样的模式：当前行中的第2列是“C”，下一行中的第2列是“G”。文件的第4列是“CG”。我想比较1到2，3到4，5到6，依此类推。然后打印一对当前行和下一行。 “C”可以出现在偶数行和奇数行中

输入如下：

chr1    C   10467   CHH CT  0.0 0   1
chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

预期输出为，由制表符分隔符分隔：

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

我的代码是：

awk '{a=$2; c=$4; d=$0; e=NR; getline; f=$2; g=$4} {if (a == "C" && f == "G" && c == "CG" && g == "CG") {print d,e,"\n",$0,NR}}' input_file

我使用getline并检查下一行是否有“G”。问题是，如果我这样做，awk将直接转到第三行，并将错过一些行。例如，输入的第2列是：

Line 1: G
Line 2: C
Line 3: G
Line 4: C

预期输出为第2行和第3行。然而，awk从第一条线直接进入第三条线，而不是逐行。因此，输出为无

亲切的问候

编辑（要将每一行与其下一行进行比较，请使用此行）：现在添加此解决方案，并使用OP的新样本

awk '
FNR>1{
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
    print prevLine ORS $0
  }
}
{
  secCol=$2
  fourthCol=$4
  prevLine=$0
}
'  Input_file

说明：添加上述内容的详细说明

awk '
##Starting awk program from here.
FNR>1{
##Checking condition if current line number is more than 1 then do following.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following. 
    print prevLine ORS $0
##Printing prevLine ORS and current line.
  }
}
{
  secCol=$2
##Creating secCol with 2nd column of current line.
  fourthCol=$4
##Creating fourthCol with 4th column of current line.
  prevLine=$0
##Setting prevLine to current line value.
}
'  Input_file ##Mentioning Input_file name here.

awk '                          ##Starting awk program from here.
FNR%2==0{                      ##Checking condition if line number is divided by 2 or not.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following.
    print prevLine ORS $0      ##Printing prevLine ORS and current line.
  }
  prevLine=secCol=fourthCol="" ##Nullifying prevLone, secCol, fourthCol here.
  next                         ##next will skip all further statements from here.
}
{
  secCol=$2                    ##Creating secCol with 2nd column of current line.
  fourthCol=$4                 ##Creating fourthCol with 4th column of current line.
  prevLine=$0                  ##Setting prevLine to current line value.
}
'  Input_file                  ##Mentioning Input_file name here.

初始解决方案（这将比较每个奇数行和偶数行）：（OP的示例在编辑后变得更清晰，但将此解决方案也保留在此处，以备将来读者使用）您是否可以尝试以下内容，仅根据显示的示例编写。这将检查前一行的第四列（第四列）是否也是

CG

，以防不需要它，然后从下面删除

&&foruthCol==“CG”

awk '
FNR%2==0{
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
    print prevLine ORS $0
  }
  prevLine=secCol=fourthCol=""
  next
}
{
  secCol=$2
  fourthCol=$4
  prevLine=$0
}
'  Input_file

输出如下

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

说明：添加上述内容的详细说明

awk '
##Starting awk program from here.
FNR>1{
##Checking condition if current line number is more than 1 then do following.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following. 
    print prevLine ORS $0
##Printing prevLine ORS and current line.
  }
}
{
  secCol=$2
##Creating secCol with 2nd column of current line.
  fourthCol=$4
##Creating fourthCol with 4th column of current line.
  prevLine=$0
##Setting prevLine to current line value.
}
'  Input_file ##Mentioning Input_file name here.

awk '                          ##Starting awk program from here.
FNR%2==0{                      ##Checking condition if line number is divided by 2 or not.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following.
    print prevLine ORS $0      ##Printing prevLine ORS and current line.
  }
  prevLine=secCol=fourthCol="" ##Nullifying prevLone, secCol, fourthCol here.
  next                         ##next will skip all further statements from here.
}
{
  secCol=$2                    ##Creating secCol with 2nd column of current line.
  fourthCol=$4                 ##Creating fourthCol with 4th column of current line.
  prevLine=$0                  ##Setting prevLine to current line value.
}
'  Input_file                  ##Mentioning Input_file name here.

伙计，我首先完全错了。我希望这次我做对了

$ awk '
$2=="G" && $4=="CG" && p2=="C" && p4=="CG" {
    print p ORS $0
}
{
    p=$0
    p2=$2
    p4=$4
}' file

输出：

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8 
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

解释：

awk '
$2=="G" &&            # the column 2 in current line is G
$4=="CG" &&           # And the column 4 of file is CG
p2=="C" &&            # the column 2 is C in a previous line
p4=="CG" {            # And the column 4 of file is CG
    print p ORS $0    # Then print a couple of current line and next line
}
{
    p=$0              # current record is previous on next round
    p2=$2             # same goes for column 2
    p4=$4             # and column 4
}' file

文件的第4列对于两个记录都是“CG”，或者只是后者？嗨，詹姆斯，第4列对于两个记录都是“CG”。改变你的想法：将当前行与前一行进行比较。@RavinderSingh13谢谢。在我的例子中，这段代码遗漏了一些像我的代码一样的输出。FNR%2=0是偶数行，此代码与“C”一起工作，显示在奇数行中。当“C”出现在偶数行中时，此代码将丢失一些输出。您可以尝试此输入：chr1 C 10467 CHH CT 0.0 0 1 chr1 C 10469 CG 0.0 0 1 chr1 G 10470 CG 0.0 0 0 8 chr1 C 10471 CG 0.0 0 0 1 chr1 G 10472 CG 1.0 88@bobia9193，请你在这里用这些细节更新你的问题，以便更好地理解。您想像这样比较

1到2、3到4、5到6行吗？或者像1到2、2到3、3到4等行？请确认一下。对不起，我更新了。谢谢你的帮助！就像我在James solution中的评论：我是这个领域的新手，所以我想问你们一些问题，让你们深入理解。这个awk的结构是什么？是否有条件在那里，开始，结束？你能解释一下关于awk结构的代码吗？再次感谢@bobia9193，我现在已经为我的编辑解决方案添加了详细的解释，它应该可以帮助您。想了解更多关于awk的知识，也请查看链接，干杯。Sr，我是这个领域的新手，所以我想问一些问题。这个awk的结构是什么？那里有条件吗？你能解释一下关于awk结构的代码吗？引用awk编程语言：每个awk程序——是一个或多个模式动作语句的序列：pattern{action}
awk的基本操作是逐个扫描输入行序列，搜索与任何模式匹配的行[if
]在程序中。--对于匹配的每个模式，执行相应的操作[then
]。因此，此解决方案的开头可以写为：{if（$2==“G”&&&$4==“CG”&&p2==“C”&&p4==“CG”）打印p或$0}
。