For loop 将目录中每个文件的文件名和子字符串打印为csv,
我一直在努力自学awk来实现以下目标,但没有取得多大成功 我有一个包含多个文本文件的目录:For loop 将目录中每个文件的文件名和子字符串打印为csv,,for-loop,awk,cat,For Loop,Awk,Cat,我一直在努力自学awk来实现以下目标,但没有取得多大成功 我有一个包含多个文本文件的目录: JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt JV-26_S48_L_RepetitiveText_ToRemove.txt ... 每个文本文件的结构如下所示数字可能会更改,但随附文本将始终保持不变 JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt 4620178 reads; of these:
JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt
JV-26_S48_L_RepetitiveText_ToRemove.txt
...
每个文本文件的结构如下所示数字可能会更改,但随附文本将始终保持不变
JV-01_S01_L007_R2_002_RepetitiveText_ToRemove.txt
4620178 reads; of these:
4620178 (100.00%) were unpaired; of these:
1226814 (26.55%) aligned 0 times
3040861 (65.82%) aligned exactly 1 time
352503 (7.63%) aligned >1 times
73.45% overall alignment rate
1601831 reads; of these:
1601831 (100.00%) were unpaired; of these:
58800 (3.67%) aligned 0 times
1344724 (83.95%) aligned exactly 1 time
198307 (12.38%) aligned >1 times
96.33% overall alignment rate
JV-26_S48_L_RepetitiveText_ToRemove.txt
4620178 reads; of these:
4620178 (100.00%) were unpaired; of these:
1226814 (26.55%) aligned 0 times
3040861 (65.82%) aligned exactly 1 time
352503 (7.63%) aligned >1 times
73.45% overall alignment rate
1601831 reads; of these:
1601831 (100.00%) were unpaired; of these:
58800 (3.67%) aligned 0 times
1344724 (83.95%) aligned exactly 1 time
198307 (12.38%) aligned >1 times
96.33% overall alignment rate
对于此目录中的每个文件,我希望编译一个csv,其中包含:
Sample Total_Reads Uniquely_Mapped_Reads Multi_Mapped_Reads Unmapped_Reads
JV-01_S01_L007_R2_002 4620178 3040861 352503 1226814
JV-26_S48_L 1601831 1344724 198307 58800
...
有没有办法用awk实现单for循环?我试图使用匹配函数。
例如,如果我可以在特定行中指定匹配搜索,然后从左到右搜索由任意数字组成的子字符串,直到找到一个空格。这将抓住该行感兴趣的子字符串
大致如下:
for file in *.txt
do
awk 'FNR == 1 {print FILENAME, match(NR==1, \d), match(NR==4, \d), match(NR==5, \d), match(NR==3, \d) } ' $file >> Names.csv
请您尝试以下,书面和测试显示的样品
awk '
BEGIN{
print "Sample Total_Reads Uniquely_Mapped_Reads Multi_Mapped_Reads Unmapped_Reads"
}
FNR==1{
if(total_reads){
print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
}
total_reads=Uniquely_Mapped_Reads=times=Multi_Mapped_Reads=Unmapped_Reads=""
sub(/_RepetitiveText.*/,"",FILENAME)
file=FILENAME
}
/reads; of these/{
total_reads=$1
next
}
/aligned exactly 1 time/{U
niquely_Mapped_Reads=$1
next
}
/aligned >1 times/{
Multi_Mapped_Reads=$1
next
}
/aligned [0-9]+ times/{
Unmapped_Reads=$1
}
END{
if(total_reads){
print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
}
}
' *.txt | column -t
请您尝试以下,书面和测试显示的样品
awk '
BEGIN{
print "Sample Total_Reads Uniquely_Mapped_Reads Multi_Mapped_Reads Unmapped_Reads"
}
FNR==1{
if(total_reads){
print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
}
total_reads=Uniquely_Mapped_Reads=times=Multi_Mapped_Reads=Unmapped_Reads=""
sub(/_RepetitiveText.*/,"",FILENAME)
file=FILENAME
}
/reads; of these/{
total_reads=$1
next
}
/aligned exactly 1 time/{U
niquely_Mapped_Reads=$1
next
}
/aligned >1 times/{
Multi_Mapped_Reads=$1
next
}
/aligned [0-9]+ times/{
Unmapped_Reads=$1
}
END{
if(total_reads){
print file,total_reads,Uniquely_Mapped_Reads,times,Multi_Mapped_Reads,Unmapped_Reads
}
}
' *.txt | column -t
这里有一个简单的方法,但它需要GNU awk来实现多字符RS
您可以使用该技巧将文件作为单个记录读取。然后,您只需要打印出所需的字段(这取决于您是否断言文本是固定的)
这里有一个简单的方法,但它需要GNU awk来实现多字符RS
您可以使用该技巧将文件作为单个记录读取。然后,您只需要打印出所需的字段(这取决于您是否断言文本是固定的)
您应该提到,多字符RS需要GNU awk。我使用Mac OS,所以我对gawk并不乐观。然而,在终端中,gawk似乎是一个推荐的命令。我是否应该在Mac OS上简单地将您的代码与“gawk-v…”一起使用?不久前我确实安装了自制软件-可能是gawk带来的?我还应该说,在脚本中手动添加“jv-01”和“jv-26”并不理想-我需要运行数百个文件。你应该提到,多字符RS需要GNU awk。我使用Mac OS,所以我对gawk并不乐观。然而,在终端中,gawk似乎是一个推荐的命令。我是否应该在Mac OS上简单地将您的代码与“gawk-v…”一起使用?不久前我确实安装了自制软件-可能是gawk带来的?我还应该说,在脚本中手动添加“jv-01”和“jv-26”并不理想-我需要运行数百个文件。@JVGen,这个解决方案应该可以与普通awk一起使用,也不需要任何特定版本。@JVGen,此解决方案应与普通awk配合使用,也不需要任何特定版本的awk。