是否可以使用bash查找一列中有重复值但其他列中没有重复值的数据?
我有一个包含多个列和行的文件。我希望获取数据并在第4列中找到重复值的行,然后将这些行打印到新文件中 我的数据文件如下所示:是否可以使用bash查找一列中有重复值但其他列中没有重复值的数据?,bash,sorting,multiple-columns,uniq,Bash,Sorting,Multiple Columns,Uniq,我有一个包含多个列和行的文件。我希望获取数据并在第4列中找到重复值的行,然后将这些行打印到新文件中 我的数据文件如下所示: RR2.out -1752.142111 -1099486.696073 0.000000 SS2.out -1752.142111 -1099486.696073 0.000000 RR1.out -1752.141887 -1099486.555511 0.140562 SS1.out -1752.141887
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
如果我只使用uniq-d,我只会
file Gibbs kcal rel
file Gibbs kcal rel
因为它们是唯一完全匹配的两行。我想知道的是,是否有一种方法可以找到第4列中具有重复值的所有行,而不总是一个完整的匹配
然后,我使用awk和read读取第1列中的文件名,因此理想情况下,我不必将数据传输到另一个文件,然后再返回,因为我发现这可能会导致与读取文件名相关的错误
在本例中,我应获得以下文件作为输出:
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
这里有一些代码可以完成您想要的功能:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } '
下面是您可以如何运行它:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } ' inputfile
其中inputfile如下所示:
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
这个程序有一个问题,它假设第4列已排序。如果确实是这样,您可以使用未经修改的代码。否则,在将输入传递给awk之前,按第4列对输入进行排序可能是值得的
要更正排序问题,您可能希望在将文件输入awk时对其进行排序。这将改变输出的顺序,因此可能需要更多的编码
以下是带有某种输入的awk脚本:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } ' <( sort -k4,4 inputfile )
这里有一些代码可以完成您想要的功能:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } '
下面是您可以如何运行它:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } ' inputfile
其中inputfile如下所示:
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
RR2.out -1752.142111 -1099486.696073 0.000000
SS2.out -1752.142111 -1099486.696073 0.000000
RR1.out -1752.141887 -1099486.555511 0.140562
SS1.out -1752.141887 -1099486.555511 0.140562
RR4.out -1752.140564 -1099485.725315 0.970758
SS4.out -1752.140564 -1099485.725315 0.970758
RR3.out -1752.140319 -1099485.571575 1.124498
SS3.out -1752.140319 -1099485.571575 1.124498
SS5.out -1752.138532 -1099484.450215 2.245858
RR6.out -1752.138493 -1099484.425742 2.270331
SS6.out -1752.138493 -1099484.425742 2.270331
file Gibbs kcal rel
file Gibbs kcal rel
这个程序有一个问题,它假设第4列已排序。如果确实是这样,您可以使用未经修改的代码。否则,在将输入传递给awk之前,按第4列对输入进行排序可能是值得的
要更正排序问题,您可能希望在将文件输入awk时对其进行排序。这将改变输出的顺序,因此可能需要更多的编码
以下是带有某种输入的awk脚本:
awk ' BEGIN { OLD4 = "No match" }
$4 == OLD4 { print LAST ; print }
{ OLD4 = $4 ; LAST = $0 } ' <( sort -k4,4 inputfile )
uniq具有-f/-skip fields选项,可以忽略每行的前n个字段
uniq -D -f3
uniq具有-f/-skip fields选项,可以忽略每行的前n个字段
uniq -D -f3