Bash 仅当文本文件中存在列的值时，才从.anno文件中获取行_Bash_Grep_Bioinformatics_Cat

Bash 仅当文本文件中存在列的值时，才从.anno文件中获取行

bash grep

Bash 仅当文本文件中存在列的值时，才从.anno文件中获取行,bash,grep,bioinformatics,cat,Bash,Grep,Bioinformatics,Cat,我对脚本和堆栈都是新手，所以如果我的问题很愚蠢或放错地方，我很抱歉我必须在Bash中完成一项任务我有一个DATA.anno文件，如下所示： ID POP LOCALITY 1 Apu Italy 2 Apu Italy 3 Tir Albania 4 Tir Albania 5 Ber Germany 6 Ber Germany 我有一个pop.txt文件，其中包含前一个文件第二列中的两个人口名称： Apu Ber 现在，我想获得另一个文件，其中只包含pop.txt文件中存在

我对脚本和堆栈都是新手，所以如果我的问题很愚蠢或放错地方，我很抱歉

我必须在Bash中完成一项任务

我有一个DATA.anno文件，如下所示：

ID POP LOCALITY
1  Apu Italy
2  Apu Italy
3  Tir Albania
4  Tir Albania
5  Ber Germany
6  Ber Germany

我有一个pop.txt文件，其中包含前一个文件第二列中的两个人口名称：

Apu
Ber

现在，我想获得另一个文件，其中只包含pop.txt文件中存在的人口行。在这种情况下，我想要获得的输出文件如下所示：

ID POP LOCALITY
1  Apu Italy
2  Apu Italy
4  Ber Germany
5  Ber Germany

我尝试过使用此脚本，但似乎不起作用：

cat pop.txt | while read line; do grep $line DATA.anno | cut -f 2,3 >> outputfile.txt

你能试试下面的吗

awk 'BEGIN{print "ID POP LOCALITY"} FNR==NR{array[$0];next} ($2 in array)'   pop.txt data.anno

解释：添加代码的详细解释

awk '                         ##Starting awk program from here.
BEGIN{                        ##Starting BEGIN section from here.
  print "ID POP LOCALITY"     ##Printing headers here.
}
FNR==NR{                      ##Checking condition FNR==NR which will be TRUE when first Input_fie is being read.
  array[$0]                   ##Creating array with index of current line.
  next                        ##next will skip all further statements from here.
}
($2 in array)                 ##Checking condition if current line 2nd field is present in array then print that line.
'   pop.txt data.anno         ##Mentioning Input_file names here.

这是一个非常常见的任务：如果不是标题，也可以使用grep实用程序：

grep-f pop.txt DATA.anno