For loop 将目录中每个文件的文件名和子字符串打印为csv,

For loop 将目录中每个文件的文件名和子字符串打印为csv,,for-loop,awk,cat,For Loop,Awk,Cat,我一直在努力自学awk来实现以下目标,但没有取得多大成功 我有一个包含多个文本文件的目录: JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt JV-26_S48_L_RepetitiveText_ToRemove.txt ... 每个文本文件的结构如下所示数字可能会更改,但随附文本将始终保持不变 JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt 4620178 reads; of these:

我一直在努力自学awk来实现以下目标,但没有取得多大成功

我有一个包含多个文本文件的目录:

JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt
JV-26_S48_L_RepetitiveText_ToRemove.txt
...
每个文本文件的结构如下所示数字可能会更改,但随附文本将始终保持不变

JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt

4620178 reads; of these:
  4620178 (100.00%) were unpaired; of these:
    1226814 (26.55%) aligned 0 times
    3040861 (65.82%) aligned exactly 1 time
    352503 (7.63%) aligned >1 times
73.45% overall alignment rate
1601831 reads; of these:
  1601831 (100.00%) were unpaired; of these:
    58800 (3.67%) aligned 0 times
    1344724 (83.95%) aligned exactly 1 time
    198307 (12.38%) aligned >1 times
96.33% overall alignment rate

JV-26_S48_L_RepetitiveText_ToRemove.txt

4620178 reads; of these:
  4620178 (100.00%) were unpaired; of these:
    1226814 (26.55%) aligned 0 times
    3040861 (65.82%) aligned exactly 1 time
    352503 (7.63%) aligned >1 times
73.45% overall alignment rate
1601831 reads; of these:
  1601831 (100.00%) were unpaired; of these:
    58800 (3.67%) aligned 0 times
    1344724 (83.95%) aligned exactly 1 time
    198307 (12.38%) aligned >1 times
96.33% overall alignment rate

对于此目录中的每个文件,我希望编译一个csv,其中包含:

Sample                  Total_Reads Uniquely_Mapped_Reads   Multi_Mapped_Reads  Unmapped_Reads
JV-01_S01_L007_R2_002   4620178     3040861                 352503              1226814
JV-26_S48_L             1601831     1344724                 198307              58800
...
有没有办法用awk实现单for循环?我试图使用匹配函数。 例如,如果我可以在特定行中指定匹配搜索,然后从左到右搜索由任意数字组成的子字符串,直到找到一个空格。这将抓住该行感兴趣的子字符串

大致如下:

for file in *.txt
do
  awk 'FNR == 1 {print FILENAME, match(NR==1, \d), match(NR==4, \d), match(NR==5, \d), match(NR==3, \d) } ' $file >> Names.csv

请您尝试以下,书面和测试显示的样品

awk '
BEGIN{
  print "Sample                  Total_Reads Uniquely_Mapped_Reads   Multi_Mapped_Reads  Unmapped_Reads"
}
FNR==1{
  if(total_reads){
    print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
  }
  total_reads=Uniquely_Mapped_Reads=times=Multi_Mapped_Reads=Unmapped_Reads=""
  sub(/_RepetitiveText.*/,"",FILENAME)
  file=FILENAME
}
/reads; of these/{
  total_reads=$1
  next
}
/aligned exactly 1 time/{U
  niquely_Mapped_Reads=$1
  next
}
/aligned >1 times/{
  Multi_Mapped_Reads=$1
  next
}
/aligned [0-9]+ times/{
  Unmapped_Reads=$1
}
END{
  if(total_reads){
    print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
  }
}
'  *.txt | column -t

请您尝试以下,书面和测试显示的样品

awk '
BEGIN{
  print "Sample                  Total_Reads Uniquely_Mapped_Reads   Multi_Mapped_Reads  Unmapped_Reads"
}
FNR==1{
  if(total_reads){
    print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
  }
  total_reads=Uniquely_Mapped_Reads=times=Multi_Mapped_Reads=Unmapped_Reads=""
  sub(/_RepetitiveText.*/,"",FILENAME)
  file=FILENAME
}
/reads; of these/{
  total_reads=$1
  next
}
/aligned exactly 1 time/{U
  niquely_Mapped_Reads=$1
  next
}
/aligned >1 times/{
  Multi_Mapped_Reads=$1
  next
}
/aligned [0-9]+ times/{
  Unmapped_Reads=$1
}
END{
  if(total_reads){
    print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
  }
}
'  *.txt | column -t
这里有一个简单的方法,但它需要GNU awk来实现多字符RS

您可以使用该技巧将文件作为单个记录读取。然后,您只需要打印出所需的字段(这取决于您是否断言文本是固定的)

这里有一个简单的方法,但它需要GNU awk来实现多字符RS

您可以使用该技巧将文件作为单个记录读取。然后,您只需要打印出所需的字段(这取决于您是否断言文本是固定的)


您应该提到,多字符RS需要GNU awk。我使用Mac OS,所以我对gawk并不乐观。然而,在终端中,gawk似乎是一个推荐的命令。我是否应该在Mac OS上简单地将您的代码与“gawk-v…”一起使用?不久前我确实安装了自制软件-可能是gawk带来的?我还应该说,在脚本中手动添加“jv-01”和“jv-26”并不理想-我需要运行数百个文件。你应该提到,多字符RS需要GNU awk。我使用Mac OS,所以我对gawk并不乐观。然而,在终端中,gawk似乎是一个推荐的命令。我是否应该在Mac OS上简单地将您的代码与“gawk-v…”一起使用?不久前我确实安装了自制软件-可能是gawk带来的?我还应该说,在脚本中手动添加“jv-01”和“jv-26”并不理想-我需要运行数百个文件。@JVGen,这个解决方案应该可以与普通awk一起使用,也不需要任何特定版本。@JVGen,此解决方案应与普通awk配合使用,也不需要任何特定版本的awk。