Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/angular/30.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
比较awk中的当前行和下一行_Awk - Fatal编程技术网

比较awk中的当前行和下一行

比较awk中的当前行和下一行,awk,Awk,我想找到这样的模式:当前行中的第2列是“C”,下一行中的第2列是“G”。文件的第4列是“CG”。我想比较1到2,3到4,5到6,依此类推。然后打印一对当前行和下一行。 “C”可以出现在偶数行和奇数行中 输入如下: chr1 C 10467 CHH CT 0.0 0 1 chr1 C 10469 CG CG 0.0 0 1 chr1 G 10470 CG CG 0.0 0 8 chr1 C 10471 CG CG 0

我想找到这样的模式:当前行中的第2列是“C”,下一行中的第2列是“G”。文件的第4列是“CG”。我想比较1到2,3到4,5到6,依此类推。然后打印一对当前行和下一行。 “C”可以出现在偶数行和奇数行中

输入如下:

chr1    C   10467   CHH CT  0.0 0   1
chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8
预期输出为,由制表符分隔符分隔:

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8
我的代码是:

awk '{a=$2; c=$4; d=$0; e=NR; getline; f=$2; g=$4} {if (a == "C" && f == "G" && c == "CG" && g == "CG") {print d,e,"\n",$0,NR}}' input_file
我使用getline并检查下一行是否有“G”。问题是,如果我这样做,awk将直接转到第三行,并将错过一些行。 例如,输入的第2列是:

Line 1: G
Line 2: C
Line 3: G
Line 4: C
预期输出为第2行和第3行。然而,awk从第一条线直接进入第三条线,而不是逐行。因此,输出为无

亲切的问候

编辑(要将每一行与其下一行进行比较,请使用此行):现在添加此解决方案,并使用OP的新样本

awk '
FNR>1{
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
    print prevLine ORS $0
  }
}
{
  secCol=$2
  fourthCol=$4
  prevLine=$0
}
'  Input_file
说明:添加上述内容的详细说明

awk '
##Starting awk program from here.
FNR>1{
##Checking condition if current line number is more than 1 then do following.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following. 
    print prevLine ORS $0
##Printing prevLine ORS and current line.
  }
}
{
  secCol=$2
##Creating secCol with 2nd column of current line.
  fourthCol=$4
##Creating fourthCol with 4th column of current line.
  prevLine=$0
##Setting prevLine to current line value.
}
'  Input_file ##Mentioning Input_file name here. 
awk '                          ##Starting awk program from here.
FNR%2==0{                      ##Checking condition if line number is divided by 2 or not.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following.
    print prevLine ORS $0      ##Printing prevLine ORS and current line.
  }
  prevLine=secCol=fourthCol="" ##Nullifying prevLone, secCol, fourthCol here.
  next                         ##next will skip all further statements from here.
}
{
  secCol=$2                    ##Creating secCol with 2nd column of current line.
  fourthCol=$4                 ##Creating fourthCol with 4th column of current line.
  prevLine=$0                  ##Setting prevLine to current line value.
}
'  Input_file                  ##Mentioning Input_file name here. 


初始解决方案(这将比较每个奇数行和偶数行):(OP的示例在编辑后变得更清晰,但将此解决方案也保留在此处,以备将来读者使用)您是否可以尝试以下内容,仅根据显示的示例编写。这将检查前一行的第四列(第四列)是否也是
CG
,以防不需要它,然后从下面删除
&&foruthCol==“CG”

awk '
FNR%2==0{
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
    print prevLine ORS $0
  }
  prevLine=secCol=fourthCol=""
  next
}
{
  secCol=$2
  fourthCol=$4
  prevLine=$0
}
'  Input_file
输出如下

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8
说明:添加上述内容的详细说明

awk '
##Starting awk program from here.
FNR>1{
##Checking condition if current line number is more than 1 then do following.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following. 
    print prevLine ORS $0
##Printing prevLine ORS and current line.
  }
}
{
  secCol=$2
##Creating secCol with 2nd column of current line.
  fourthCol=$4
##Creating fourthCol with 4th column of current line.
  prevLine=$0
##Setting prevLine to current line value.
}
'  Input_file ##Mentioning Input_file name here. 
awk '                          ##Starting awk program from here.
FNR%2==0{                      ##Checking condition if line number is divided by 2 or not.
  if(secCol=="C" && $2=="G" && fourthCol=="CG" && $4=="CG"){
##Checking condition if secCol is C AND 2nd column is G AND fourthCol is CG and 4th column is CG then do following.
    print prevLine ORS $0      ##Printing prevLine ORS and current line.
  }
  prevLine=secCol=fourthCol="" ##Nullifying prevLone, secCol, fourthCol here.
  next                         ##next will skip all further statements from here.
}
{
  secCol=$2                    ##Creating secCol with 2nd column of current line.
  fourthCol=$4                 ##Creating fourthCol with 4th column of current line.
  prevLine=$0                  ##Setting prevLine to current line value.
}
'  Input_file                  ##Mentioning Input_file name here. 

伙计,我首先完全错了。我希望这次我做对了

$ awk '
$2=="G" && $4=="CG" && p2=="C" && p4=="CG" {
    print p ORS $0
}
{
    p=$0
    p2=$2
    p4=$4
}' file
输出:

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8 
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8
解释:

awk '
$2=="G" &&            # the column 2 in current line is G
$4=="CG" &&           # And the column 4 of file is CG
p2=="C" &&            # the column 2 is C in a previous line
p4=="CG" {            # And the column 4 of file is CG
    print p ORS $0    # Then print a couple of current line and next line
}
{
    p=$0              # current record is previous on next round
    p2=$2             # same goes for column 2
    p4=$4             # and column 4
}' file

文件的第4列对于两个记录都是“CG”,或者只是后者?嗨,詹姆斯,第4列对于两个记录都是“CG”。改变你的想法:将当前行与前一行进行比较。@RavinderSingh13谢谢。在我的例子中,这段代码遗漏了一些像我的代码一样的输出。FNR%2=0是偶数行,此代码与“C”一起工作,显示在奇数行中。当“C”出现在偶数行中时,此代码将丢失一些输出。您可以尝试此输入:chr1 C 10467 CHH CT 0.0 0 1 chr1 C 10469 CG 0.0 0 1 chr1 G 10470 CG 0.0 0 0 8 chr1 C 10471 CG 0.0 0 0 1 chr1 G 10472 CG 1.0 88@bobia9193,请你在这里用这些细节更新你的问题,以便更好地理解。您想像这样比较
1到2、3到4、5到6行吗?或者像
1到2、2到3、3到4等行?请确认一下。对不起,我更新了。谢谢你的帮助!就像我在James solution中的评论:我是这个领域的新手,所以我想问你们一些问题,让你们深入理解。这个awk的结构是什么?是否有条件在那里,开始,结束?你能解释一下关于awk结构的代码吗?再次感谢@bobia9193,我现在已经为我的编辑解决方案添加了详细的解释,它应该可以帮助您。想了解更多关于awk的知识,也请查看链接,干杯。Sr,我是这个领域的新手,所以我想问一些问题。这个awk的结构是什么?那里有条件吗?你能解释一下关于awk结构的代码吗?引用awk编程语言:每个awk程序——是一个或多个模式动作语句的序列:
pattern{action}
awk的基本操作是逐个扫描输入行序列,搜索与任何模式匹配的行[
if
]在程序中。--对于匹配的每个模式,执行相应的操作[
then
]。因此,此解决方案的开头可以写为:
{if($2==“G”&&&$4==“CG”&&p2==“C”&&p4==“CG”)打印p或$0}