Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
是否可以使用bash查找一列中有重复值但其他列中没有重复值的数据?_Bash_Sorting_Multiple Columns_Uniq - Fatal编程技术网

是否可以使用bash查找一列中有重复值但其他列中没有重复值的数据?

是否可以使用bash查找一列中有重复值但其他列中没有重复值的数据?,bash,sorting,multiple-columns,uniq,Bash,Sorting,Multiple Columns,Uniq,我有一个包含多个列和行的文件。我希望获取数据并在第4列中找到重复值的行,然后将这些行打印到新文件中 我的数据文件如下所示: RR2.out -1752.142111 -1099486.696073 0.000000 SS2.out -1752.142111 -1099486.696073 0.000000 RR1.out -1752.141887 -1099486.555511 0.140562 SS1.out -1752.141887

我有一个包含多个列和行的文件。我希望获取数据并在第4列中找到重复值的行,然后将这些行打印到新文件中

我的数据文件如下所示:

 RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
如果我只使用uniq-d,我只会

file Gibbs kcal rel
file Gibbs kcal rel
因为它们是唯一完全匹配的两行。我想知道的是,是否有一种方法可以找到第4列中具有重复值的所有行,而不总是一个完整的匹配

然后,我使用awk和read读取第1列中的文件名,因此理想情况下,我不必将数据传输到另一个文件,然后再返回,因为我发现这可能会导致与读取文件名相关的错误

在本例中,我应获得以下文件作为输出:

 RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel

这里有一些代码可以完成您想要的功能:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '  
下面是您可以如何运行它:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '   inputfile
其中inputfile如下所示:

 RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
这个程序有一个问题,它假设第4列已排序。如果确实是这样,您可以使用未经修改的代码。否则,在将输入传递给awk之前,按第4列对输入进行排序可能是值得的

要更正排序问题,您可能希望在将文件输入awk时对其进行排序。这将改变输出的顺序,因此可能需要更多的编码

以下是带有某种输入的awk脚本:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '   <( sort -k4,4 inputfile )

这里有一些代码可以完成您想要的功能:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '  
下面是您可以如何运行它:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '   inputfile
其中inputfile如下所示:

 RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
RR2.out    -1752.142111    -1099486.696073  0.000000
 SS2.out    -1752.142111    -1099486.696073  0.000000
 RR1.out    -1752.141887    -1099486.555511  0.140562
 SS1.out    -1752.141887    -1099486.555511  0.140562
 RR4.out    -1752.140564    -1099485.725315  0.970758
 SS4.out    -1752.140564    -1099485.725315  0.970758
 RR3.out    -1752.140319    -1099485.571575  1.124498
 SS3.out    -1752.140319    -1099485.571575  1.124498
 SS5.out    -1752.138532    -1099484.450215  2.245858
 RR6.out    -1752.138493    -1099484.425742  2.270331
 SS6.out    -1752.138493    -1099484.425742  2.270331
 file Gibbs kcal rel
 file Gibbs kcal rel
这个程序有一个问题,它假设第4列已排序。如果确实是这样,您可以使用未经修改的代码。否则,在将输入传递给awk之前,按第4列对输入进行排序可能是值得的

要更正排序问题,您可能希望在将文件输入awk时对其进行排序。这将改变输出的顺序,因此可能需要更多的编码

以下是带有某种输入的awk脚本:

awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print  } 
{ OLD4 = $4 ; LAST = $0  }  '   <( sort -k4,4 inputfile )
uniq具有-f/-skip fields选项,可以忽略每行的前n个字段

uniq -D -f3
uniq具有-f/-skip fields选项,可以忽略每行的前n个字段

uniq -D -f3